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Abstract 

The  first  part  of  the  dissertation  investigates  the  application  of  the  theory  of  large 
random  matrices  to  high-dimensional  inference  problems  when  the  samples  are  drawn 
from  a  multivariate  normal  distribution*  A  longstanding  problem  in  sensor  array  pro¬ 
cessing  is  addressed  by  designing  an  estimator  for  the  number  of  signals  in  white  noise 
that,  dramatically  outperforms  that  proposed  by  Wax  and  Kailath,  This  methodology  is 
extended  to  develop  new  parametric  techniques  for  testing  and  estimation.  Unlike  tech¬ 
niques  found  in  the  literature,  these  exhibit  robustness  to  high-dimensionality,  sample 
size  constraints  and  eigenvector  misspecifi  cation. 

By  interpreting  the  eigenvalues  of  the  sample  covariance  matrix  as  an  interacting 
particle  system,  the  existence  of  a  phase  transition  phenomenon  in  the  largest  ("signal”) 
eigenvalue  is  derived  using  heuristic  arguments.  This  exposes  a  fundamental  limit  on 
the  identifiability  of  low-level  signals  due  to  sample  size  constraints  when  using  the 
sample  eigenvalues  alone. 

The  analysis  is  extended  to  address  a  problem  in  sensor  array  processing,  posed  by 
Baggeroer  and  Cox,  on  the  distribution  of  the  outputs  of  the  Capon-MVDR  beamformer 
when  the  sample  covariance  matrix  Is  diagonally  loaded. 

The  second  part  of  the  dissertation  investigates  the  limiting  distribution  of  the 
eigenvalues  and  eigenvectors  of  a  broader  class  of  random  matrices.  A  powerful  method 
is  proposed  that  expands  the  reach  of  the  theory  beyond  the  special  cases  of  matrices 
with  Gaussian  entries;  this  simultaneously  establishes  a  framework  for  computational 
( non-commutative)  “free  probability”  theory. 

The  class  of  "algebraic”  random  matrices  is  defined  and  the  generators  of  tliis  class 
are  specified,  Algebraicity  of  a  random  matrix  sequence  is  shown  to  act  as  a  certificate 
of  the  computability  of  the  limiting  eigenvalue  distribution  and,  for  a  subclass,  the  lim¬ 
iting  conditional  “eigenvector  distribution,”  The  limiting  moments  of  algebraic  random 
matrix  sequences,  when  they  exist*  are  shown  to  satisfy  a  finite  depth  linear  recursion 
so  that  they  may  often  be  efficiently  enumerated  in  closed  form.  The  method  is  applied 
to  predict  the  deterioration  in  the  quality  of  the  sample  eigenvectors  of  large  algebraic 
empirical  covariance  matrices  due  to  sample  size  constraints. 

Thesis  Supervisor:  Alan  Edelman 
Title:  Professor  of  Applied  Mathematics 
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Chapter  1 


Foreword 


This  thesis  applies  random  matrix  theory  to  high-dimensional  inference  problems.  For 
measurements  modeled  as 

x*  =  A  Si  +  z i, 

the  term  “high-dimensional''  refers  to  the  dimension  n  of  the  vector  x*.  In  array  pro¬ 
cessing  applications  such  as  radar  and  sonar,  where  the  elements  of  x?  are  interpreted  as 
spatial  observations,  n  can  range  from  ten  up  to  a  few  thousand.  The  elements  of  x*  are 
modeled  as  random  variables  and  have  different  interpretations  in  different  applications 
but  the  core  problem  can  be  succinctly  summarized: 

How  does  one  use  m  samples  (measurements),  X],...,xm  to  estimate,  its 
accurately  as  possible,  the  n-hy-k  matrix  A  or  the  fc-by-1  vectors  Si, ... , 
or  both,  in  the  presence  of  random  noise  z ,-? 

In  array  processing  applications  such  as  radar  and  sonar,  accurate  estimation  of  the 
matrix  A  leads  to  a  oommensurately  accurate  estimation  of  the  location  of  an  airplane. 
In  an  application,  referred  to  as  the  “cocktail  party  problem,”  [44]  a  sensor  array  is  used 
to  estimate  A  and  hence  the  positions  of  persons  speaking  in  a  room;  this  information 
is  then  used  to  isolate  the  voices  of  the  various  speakers. 

Variations  of  this  setup  abound  in  applications  such  as  time-series  analysis,  wireless 
communications,  econometrics,  geophysics,  and  many  more.  Consequently,  this  problem 
hits  been  formulated,  and  “solved”  by  many  research  communities.  Almost  all  the 
traditional  solutions  assume,  however,  that  there  are  enough  data  samples  available, 
relative  to  the  number  of  sensors,  so  that  an  accurate  statistical  characterization  can 
be  performed  on  the  measured  data.  When  the  number  of  sensors  is  relatively  small 
(less  than  8)  this  assumption  is  reasonable.  However,  as  we  keep  adding  sensors,  this 
assumption  is  violated  so  that  traditional  algorithms  perform  considerably  worse  than 
expected. 

This  curse  of  high-dimensionality  seemingly  contradicts  our  expectation  (hope,  re¬ 
ally)  that  adding  more  sensors  translates  into  improved  performance.  Taking  more 
samples  is  often  not  an  option  because  of  the  time-varying  nature  of  the  problem  (e.jj,, 
tracking  an  airplane).  Thus,  devising  techniques  to  counteract  this  effect  will  have  a 
positive  impact  on  many  areas,  This  is  where  random  matrices  become  relevant. 
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The  n-by-ra  matrix  obtained  by  stacking  the  m  measurements  made  at  n  sensors 
alongside  each  other  is  a  “random  matrix”  because  its  elements  are  random  variables. 
As  the  dimensions  of  the  matrix  get  large  a  remarkable  phenomenon  occurs;  the  behavior 
of  a  large  class  of  random  matrices  becomes  increasingly  non- random  in  a  manner  than 
can  be  predicted  analytically.  In  fact  the  larger  the  size  of  the  matrix,  the  lesser  the 
unpredictability,  Le.,  the  magnitude  of  the  “fluctuation.” 

This  observation  is  the  starting  point  for  our  research.  The  hypothesis  we  explore 
in  this  thesis  is  that  high-dimensionality  in  such  settings  might  just  be  a  blessing 
provided  the  underlying  model  is  physically  justifiable  and  the  non-random  part  can 
be  concretely  predicted  and  taken  advantage  of.  One  of  the  main  contributions  of  this 
thesis  is  the  development  of  a  framework  based  on  this  philosophy  (see  Chapters  (i  -  10) 
to  design  iinplementable  estimation  and  hypothesis  testing  algorithms  {see  Chapters  3 
and  4)  for  physically  motivated  random  matrix  models. 

In  a  setting  of  interest  to  many  research  communities,  we  are  able  to  characterize 
t  he  fundamental  limit  of  these  techniques  signals  can  be  reliably  detected  using  these 
techniques  only  if  their  power  is  above  a  threshold  that  is  a  simple  function  of  the 
noise  power,  the  dimensionality  of  the  system  and  the  number  of  samples  available  (s<m* 
Chapter  3). 

Along  the  way,  we  unlock  the  power  of  Tree  probability”  the  mathematical  theory 
that  reveals  the  hidden  structure  lurking  behind  these  high-dimensional  objects.  While 
in  the  past  the  non-random  behavior  for  large  random  matrices  could  only  be  predicted 
for  some  special  cases,  the  computational  tools  we  develop  ensure  that  concrete  predic¬ 
tions  can  be  made  for  a  much  broader  class  of  matrices  than  thought  possible.  The  tools 
reveal  the  full  power  of  the  t  heory  in  predicting  the  global  behavior  of  the  eigenvalues 
and  eigenvectors  of  large  random  matrices  (see  Chapters  6  -  10). 

The  statistical  techniques  developed  merely  scratch  the  surface  of  this  theory  our 
hope  in  presenting  the  software  version,  of  the  “free  probability”  or  random  matrix 
“calculator,”  alongside  the  mathematics  that  facilitates  the  computational  realization 
is  that  readers  will  take  the  code  as  a  starting  point  for  their  own  experimentation  and 
develop  additional  applications  of  the  theory  on  which  our  ideas  art'  based.  Readers 
interested  in  this  latter  framework  may  proceed  directly  to  Chapter  6  and  skip  the 
preceding  application  oriented  material. 

We  provide  an  overview  of  the  sample  covariance  matrix  based  inference  problems 
in  signal  processing  in  Chapter  2.  Our  point  of  departure  will  be  inference  problems  in 
sensor  array  processing.  Practitioners  in  other  areas  of  science  and  engineering  should 
easily  be  able  to  adapt  the  proposed  techniques  to  their  applications. 


Chapter  2 


Introduction 


The  statistical  theory  of  signal  processing  evolved  in  the  1930’s  and  1940’s,  spurred 
in  large  part  by  the  successful  consummation  of  mathematical  theory  and  engineering 
practice  [112].  Correlation  techniques  for  time  series  analysis  played  a  key  role  in  the 
mathematics  developed  at  the  time  by  Wiener  and  colleagues.  To  quote  Professor  Henry 
J,  Zimmerman,  the  Director  of  MIT’s  Research  Lab  of  Electronics,  {italics  added  for 
emphasis) 

u; . .  the  potential  significance  of  correlation  techniques  had  fired  the  imagi¬ 
nation  . . .  the  genera]  enthusiasm  was  due  to  experimental  evidence  , . .  that 
weak  signals  could  be  recovered  in  the  presence  of  noise  using  correlation 
techniques.  From  that  point  on  the  field  evolved  very  rapidly  [119]/’ 

Covariance  matrix  based  methods  were  the  natural  extension  of  correlation  tech¬ 
niques  to  multi-channel  signal  processing  algorithms  and  remain  widely  used  to  this 
day  [102].  Array  processing  applications  involving  radar  and  sonar  were  amongst  the 
first  to  use  such  techniques  for  tasks  involving  as  detection,  estimation,  and  classifica¬ 
tion,  Representative  applications  include  detecting  airplanes,  estimating  environmental 
parameters  using  an  array  of  sensors,  and  classifying  objects  based  on  surface  reflections 
received  at  a  sensor  bank. 

■  2.1  Role  of  sample  covariance  matrices  in  signal  processing 

Typically,  since  the  true  covariance  matrix  is  unknown,  a  sample  covariance  matrix  is 
used.  Hence,  many  modern  multichannel  signal  processing  algorithms  used  in  practice 
can  be  labelled  as  sample  covariance  matrix  based.  The  role  of  random  matrices  enters 
because  of  the  statistical  characterization  of  the  sample  covariance  formed  by  summing 
over  the  outer  products  of  the  m  observation  (or  “snapshot”)  vectors  xi,...xm  when 
forming  the  n  x  n  sample  covariance  matrix  R  as 

R=-XX'.  (2.1) 

m 

where  the  '  denotes  the  Hermitian  transpose  and  the  data  matrix  X  =  [xj  |...|x,„] 
is  an  n  x  m  matrix  whose  rows  represent  measurements  made  at  the  sensors  (spatial 
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and/or  frequency  samples)  and  columns  represent  snapshots  (time  samples).  When  the 
snapshots  are  modelled  as  a  multivariate  Gaussian  with  covariance  R,  then  the  random 
sample  covariance  matrix  in  (2d)  is  an  instance  of  the  Wishart  matrix  [114]  extensively 
studied  by  statisticians. 


■  2*1,1  Statistical  inference  from  random  sample  covariance  matrices 

Inference  techniques  posed  on  sample  covariance  matrices  with  the  Wishart  distribu¬ 
tion  include  algorithms  for  testing  hypothesis  { e.g is  there  an  airplane  present?)  and 
estimating  values  of  parameters  (e.#.,  where  Is  the  airplane  located?).  In  terms  of 
the  sample  covariance  matrix,  these  algorithms  can  be  classified  as  either  exploiting 
the  eigenvector  structure  of  the  (assumed)  true  covariance  matrix  or  the  eigenvalue 
structure.  When  the  physics  of  the  operating  environment  are  adequately  modelled,  a 
maximum  likelihood  technique  can  be  used  to  estimate  the  unknown  parameters.  When 
algorithmic  computational  complexity  is  a  factor,  estimation  in  these  settings  is  often 
reduced  to  the  computation  of  a  weight  vector,  which  is  given  {up  to  a  scale  factor)  by 


w  ot  R  'v,  (2.2) 

where  v  is  termed  a  replica,  or  matching  signal  vector  or  a  spatial  matched  filter. 

In  recursive  methods  this  weight  is  computed  dynamically  a s  data  is  accumulated 
with  a  “forgetting”  factor  which  decreases  the  influence  of  older  data;  for  example,  using 
recursive  least  squares  algorithms.  Regardless  of  the  method,  the  underlying  problem  in 
statistical  signal  processing  [7]  is  that  the  non-st  ationarity  and/or  inhomogeneity  of  the 
data  limits  the  number  of  samples  m,  which  can  be  used  to  form  the  sample  covariance 
matrix  R.  This  non-stationarity/inhomogeneity  can  be  caused  by  the  motion  of  ships, 
aircraft,  satellites,  geophysical  and/or  oceanographic  processes  and  regions;  in  other 
words,  it  is  often  inherent  to  the  operating  environment  and  cannot  be  “designed  away,'1 
Examining  the  weight  vector  computation  in  (2.2)  more  carefully  reveals  why  we 
label  such  techniques  as  exploiting  eigenvector  information.  The  weight  vector  can  be 
written  in  terms  of  the  projection  of  the  replica  vector  v  onto  the  sample  eigenspaee  as 


w  a 


"  1 
„■ _ i  A  j 


(2.3) 


Clearly,  as  the  expression  in  (2.3)  indicates,  the  computation  of  the  weight  vector  is 
directly  affected  by  the  projection  (u;,  v)  of  the  signal  bearing  vector  v  onto  the  sample 
eigenvectors  Cv 

There  is  a  whole  class  of  statistical  inference  problems  involving  detection  and  esti¬ 
mation  that,  do  not  rely  on  eigenvector  information.  Here,  inference  is  performed  directly 
oil  the  eigenvalues  of  the  sample  covariance  matrix.  Examples  include  the  ubiquitous  (in 
signal  processing  applications)  Wax-Kailath  estimator  [11  1]  for  the  number  of  signals 
in  white  noise  and  Anderson’s  tests  and  estimators  developed  in  his  landmark  paper  on 
principal  components  analysis  [Ci], 
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These  “classical”  techniques  exploit  the  large  sample  size  asymptotics  of  the  eigen¬ 
values  of  Wishart  distributed  sample  covariance  matrices*  However,  the  asymptotics 
developed  are  not  adequate  for  high-dimensional  settings  so  that  they  only  work  well 
when  m  >  n , 

An  early  motivation  for  such  eigenvalue  based  inference  techniques  (“eigen- inference”) 
was  the  computational  savings  relative  to  maximum  likelihood  methods  that  incorporate 
information  about  the  eigenvectors  of  the  underlying  covariance  matrix.  The  inference 
methodologies  developed  in  this  thesis  fall  in  the  category  of  posing  the  estimation  and 
hypothesis  testing  problem  on  the  sample  eigenvalues  alone*  Since  we  discard  the  in¬ 
formation  in  the  eigenvectors,  this  necessarily  compromises  their  performance  relative 
to  algorithms  that  use  high-quality  parametric  models  for  the  eigenvectors.  Conversely, 
this  provides  the  justification  for  “robust  ifying”  eigenvector  dependent  inferential  algo 
rithms  when  the  models  for  the  eigenvectors  are  of  uncertain  quality  which  is  often  the 
ease  in  high-dimensional  settings. 


■  2.1,2  Algorithmic  performance  measures 

Regardless  of  whether  the  estimators  exploit  eigenvector  information  or  not,  for  signal 
processing  applications  involving  parameter  estimation,  the  mean  square  estimation 
error  is  a  commonly  used  performance  metric,  A  common  practice  is  to  compare  the 
simulated  performance  of  an  algorithm  with  the  Cramer-Rao  lower  bound  [25,72]  since 
the  latter  is  the  theoretically  optimal  performance  achievable  by  the  best  possible  (un¬ 
biased)  estimator.  Figure  2-1  shows  a  typical  mean  square  error  performance  plot  for 
an  estimator  as  a  function  of  the  level  of  the  signal  (or  parameter)  that  is  being  es¬ 
timated,  For  asymptotically  large  signal  levels,  the  performance  of  most  algorithms 
matches  t he  Cramer-Rao  lower  bound  unless  there  is  a  saturation  in  performance  be¬ 
cause  of  a  model  misspecification  [115,116],  The  latter  issue  motivates  the  bulk  of  the 
algorithms  developed  in  this  thesis  -  we  shall  return  to  it  shortly* 

There  are  three  regimes  in  plots  such  as  Figure  2-1  that  need  to  be  distinguished* 
The  performance  loss  of  an  algorithm,  as  shown  in  Figure  2-1,  is  measured  with  respect 
to  the  difference  between  the  achieved  mean  square  estimation  error  and  the  Cramer- 
Rao  lower  bound. 

In  Figure  2-h  Regime  III  is  referred  to  as  the  asymptotic  regime  and  is  characterized 
bv  an  approximately  linear  (on  appropriate  rescaling)  behavior*  Regime  II  is  character¬ 
ized  by  a  rapid,  highly  nonlinear  deterioration  in  the  performance  of  the  algorithm  -  a 
phase  transition,  as  it  were.  This  breakdown  has  been  observed  and  studied  by  several 
authors  [43,52,60,71,100,101,116]  and  is  referred  to  as  a  threshold  effect  [103].  The 
exact  signal  level  where  it  occurs  depends  on  the  algorithm  in  question  and  can  be  com¬ 
plicated  to  compute  [74,118]*  Different  signal  processing  algorithms  behave  differently 
in  this  regime  -  some  suffer  more  gradual  deterioration  in  performance  than  others.  The 
onset  of  this  regime  is  characterized  by  ambiguous  sidelobes  in  the  parameter  space  ac¬ 
companied  by  a  deterioration  in  the  reliability  of  the  sample  eigenvectors  [43, 101, 103]* 
Regime  L  sometimes  referred  to  as  the  no  information  regime ,  occurs  when  the  sidelobes 
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Regime :  I  II  Ml 


Mean  Square 
Estimation  Error 


Figure  2-1.  Typical  performance  plot  for  the  mean  square  estimation  error  of  a  signal  processing 
algorithm.  Three  broad  regimes  that  characterize  the  performance  relative  to  the  signal  level  are 
identified.  Model  mismatch  refers  to  actual  physical  model  differing  from  the  assumed  model.  This 
behavior  has  been  well  documented  by  many  authors  including  [115,116], 


in  the  parameter  space  lead  to  highly  biased  estimates  leading  to  unacceptably  high 
estimation  error.  Regimes  II  and  II 1  are  thus  of  interest  when  designing  algorithms. 

The  utility  of  such  plots  in  practice  comes  from  their  being  able  to  indicate  how 
well  the  chosen  algorithm  is  able  to  detect  anti  estimate  low  level  signals.  When  then' 
are  sample  size  constraints,  there  is  a  deterioration  in  performance;  this  is  referred  to 
in  array  processing  literature  as  the  “snapshot  problem1'  which  we  discuss  next. 

■  2.1.3  Impact  of  sample  size  constraints 

For  multi-channel  signal  processing  algorithms  the  realized  performance  and  the  Cramer- 
Rao  lower  bound  are,  roughly  speaking*  a  function  of  the  true  covariance  matrix  (which 
encodes  the  signal  level  and  the  number  of  sensors  n)  and  the  number  of  snapshots 
rn  [90].  It  is  quite  natural  to  characterize  the  performance  as  a  function  of  the  ratio 
n/m  of  the  number  of  sensors  to  the  number  of  snapshots.  For  a  fixed  number  of  sensors 
n  the  performance  of  the  algorithm  for  a  chosen  signal  level  improves  as  the  number  of 
snapshots  increases.  In  other  words,  for  a  fixed  n  as  rn  — *  oc,  the  ratio  n/m  — *  0  and 
the  performance  improves. 

For  array  j > rocess ing  applicat i o ns ,  t here  are  t wo  we 1 1  -  k  n ow 1 i  res u  1  ts  t  1 1 at  c ■  aj >t u re  this 
analytically  -  however,  only  in  the  scenario  when  t  he  sample  support  exceeds  t  lie1  number 
of  sensors,  i.e rn  >  n.  The  first  ,  known  as  the  Capon-Goodman  result  [20],  states  that 
the  energy,  or  mean  square  value,  of  the  projection  of  the  weight  vector  w  for  the  so- 
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called  Capon  algorithm  [19]  onto  a  data  vector  has  complex  chi-squared  distribution 
with  m  —  degrees  of  freedom  where,  as  usual,  m  is  the  number  of  snapshots  and  if 
is  number  of  sensors.  The  second,  under  the  same  conditions  characterizes  the  signal  to 
noise  ratio,  where  the  signal  is  the  replica  v  scaled  by  its  amplitude  and  the  noise  level 
is  the  power  obtained  using  the  Capon-Goodman  result.  The  results  states  that  the 
power  has  beta  distribution  leading  to  often  quoted  Reed-Brennan-Mallett  result  [18] 
that  one  needs  the  ratio  77/m  between  1/2  to  1/3  for  obtaining  “adequate”  signal  to 
noise  ratio. 

There  are  many  current  applications  where  meeting  this  sample  support  requirement 
is  just  not  possible.  In  large  array  processing  applications  in  radar  and  sonar,  the  ratio 
of  the  number  of  sensors  to  snapshots  is  around  between  2—100  though  often  it  is  much 
more.  An  immediate  consequence  of  this  for  estimation  problems  posed  in  terms  of  the 
eigenvectors  is  that  the  weight  vector  cannot  be  computed  using  (2.2)  since  the  sample 
covariance  matrix  R  given  by  (2.1)  is  singular.  Practical  strategies  have  been  in  place 
since  the  mid  1960’s  to  overcome  this. 


■  2.1.4  Diagonal  loading  and  subspace  techniques 

Several  fields  have  developed  a  number  of  fixes  to  the  problems  that,  arise  when  R  is 
singular.  Two  methods  dominate  the  approaches.  In  the  first,  the  sample  covariance 
matrix  is  “diagonally  loaded”  [24].  The  sample  covariance  matrix  R  in  (2.2)  is  replaced 
with  a  diagonally  loaded  R<5  given  by 

R*  =  R  +  <5I,  (2.4) 

so  that  the  weight  vector  computation  in  (2.3)  becomes 

n  1 

- -u  i{u,,v).  (2.5) 

Sometimes  this  is  termed  “ridge  regression”  [47],  “regularization,”  “shrinkage  parame¬ 
ter"  [35]  or  “white  noise  gain  control"  [102].  It  appears  that  every  application  lias  its 
own  vocabulary.  This  approach  has  the  impact  of  putting  a  “floor”  on  the  low  eigen¬ 
values  so  when  the  inverse  is  taken,  they  do  not  dominate  the  solution.  The  choice  of 
the  diagonal  loading  or  regularization  parameter  <5  is  an  important  factor  that  affects 
the  statistical  robustness  and  the  sensitivity  of  the  underlying  algorithm. 

The  second  approach  is  based  on  subspaces.  Most  often,  the  sample  eigenvalues,  A; 

for  i  —  1,2 . n  are  ordered  and  only  those  above  a  threshold  are  used.  This  is  termed 

dominant  mode  rejection  (DMR)  [69].  The  processing  is  then  done  on  the  remaining 
subspace,  either  with  each  sample  eigenvector/eigenvalue  or  with  an  appropriate  trans¬ 
formation  to  reduce  the  signal  dimensionality.  The  issue  here  is  to  establish  the  signal 
subspace;  once  done,  most  of  the  existing  algorithms  can  be  used.  Some  of  these  nt>- 
proaches  are  discussed  in  Van  Trees  [102].  In  other  words,  the  resulting  weight  vector 
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computation  can  be  expressed  as: 

n 

wss(y)  oc  51  (sK^Ox'iA))  Ui(uj.v),  (2.a) 

i=] 

where  </(♦)  is  an  appropriate  function  that  depends  on  the  algorithm  used,  and  xy(\i) 
is  the  indicator  function  that  is  equal  to  1  when  A*  >  y  >  0  and  0  otherwise. 

Other  variants  of  these  processing  techniques  are  also  found  in  practice  and  in  the 
literature.  Van  Trees  [102]  and  Scharf  [78]  are  good  references  on  this  and  other  related 
subjects.  A  common  thread  in  all  of  these  techniques,  that  can  be  discerned  from  the 
expressions  in  (2.5)  and  (2.6),  is  that  they  essentially  represent  different  schemes  for 
weighting  the  contribution  of  the  individual  sample  eigenvectors  in  the  computation  of 
the  weight  vector  w  Such  fixes  affect  the  performance  of  eigenvector  based  paramet  er 
estimation  algorithms. 

Analyzing  the  distribution  of  the  outputs  when  a  diagonally  loaded  SCM  is  used  for 
the  computing  the  weight  vector  is  the  first  step  in  analyzing  its  impact  on  performance. 
Lack  on  analytical  results  in  this  direction  has  been  an  outstanding  problem  for  a  while 
in  the  community  [7].  In  Chapter  5  we  provide  the  first  such  analytical  results  for 
the  Capon  beamformer  under  diagonal  loading.  For  inference  problems  posed  using  the 
eigenvalues  alone,  diagonal  loading  and  other  schemes  are  of  little  use  since  they  modify 
the  eigenvalues  in  a  predictable  manner.  Hence,  practitioners  continue  to  use  the  Wax- 
Kailatli  estimator  [111]  and  the  algorithms  proposed  by  Anderson  [6]  for  eigenvalue 
based  inference  even  though  are  dearly  inadequate  in  high-dimensional,  sample  size 
constrained  settings  found  in  an  increasing  number  of  applications, 

■  2.1,5  High-dimensional  statistical  inference 

There  are  many  existing  applications  that  already  operate  in  a  severely  sample  size 
constrained  regime.  Currently,  engineers  and  scientists  deploy  array  processing  systems 
with  a  Very  large  number  of  sensors.  Arrays  with  up  to  6000  geophones  are  now  used 
in  geophysical  exploration  for  oil:  US  Navy  towed  arrays  now  have  100  to  1000  sensors, 
phased  array  radars  have  100‘s  of  dipoles  or  turned  helices.  The  current  state  of  arrays 
now  stretches  the  snapshot  support  and  future  ones  certainly  will  only  exacerbate  the 
situation  further. 

In  adaptive  array  processing  applications,  we  are  already  in  a  situation  where  sample 
covariance  matrix  based  estimators  that  rely  on  Anderson's  eigen-analysis  for  m  n 
perform  inadequately.  In  situations  with  sample  size  constraints  where  the  model  is 
misspecified  to  the  extent  that  eigenvector  information  is  no  longer  reliable,  we  often 
witness  performance  saturation  as  depicted  in  Figure  2-1.  It  is  important  to  develop 
sample  eigenvalue  based  inference  algorithms  that  supplant  the  methodologies  proposed 
in  [6]  and  [111].  One  of  the  contributions  of  this  thesis  (see  Chapters  3-4)  is  the  develop¬ 
ment.  of  such  algorithms  that  are  robust  to  high-dimensionality,  sample  size  constraints 
and  eigenvector  n ^specification . 
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We  note  that  many  practically  important  estimation  problems  can  only  be  formu¬ 
lated  in  terms  of  the  sample  eigenvectors;  in  such  cases  the  parameter  of  interest  {e.g,, 
location  of  the  airplane)  resides  in  the  “noisy”  sample  eigenvector.  In  such  applications, 
parameter  estimation  is  posed  as  a  joint  detection-estimation  problem  the  number  of 
signals  is  first,  estimated  so  that  the  parameters  associated  with  the  signals  can  then 
be  extracted.  The  former  problem  can  be  posed  in  the  eigenvalue  domain  only  and 
the  techniques  vve  have  developed  (see  Chapter  3)  will  outperform  other  techniques  in 
the  literature.  They  will  not,  however,  help  improve  the  subsequent  parameter  estima¬ 
tion  problem  which  will  still  be  impacted  by  the  sample  eigenvector  degradation  due  to 
sample  size  constraints.  However,  we  note  that  in  this  thesis  we  have  made  noteworthy 
progress  in  this  direction.  We  have  developed  the  first  computational  framework  for 
analyzing  eigenvectors  of  a  large  class  of  random  matrices  (see  Chapter  10)  including 
those  with  Wishart  distribution.  We  believe  that  this  should  pave  the  way  for  the  de¬ 
velopment  of  new  eigenvector  based  inference  methodologies  that  are  similar  robust  to 
high-dimensionality  and  eigenvector  misspecification. 


■  2.2  Random  matrix  theory 


It  is  worth  emphasizing  the  nature  of  the  stochastic  eigen- analysis  results  being  ex¬ 
ploited*  Finite  random  matrix  theory,  of  the  sort  found  in  references  such  as  Muir- 
head  [6(>]  is  concerned  with  obtaining  exact  characterizations,  for  every  n  and  m,  of  the 
distribution  of  the  eigenvalues  of  random  sample  covariance  matrices.  Consequently, 
the  finite  random  matrix  theory  results  often  focus  on  the  Wishart  distribution. 

Infinite  random  matrix  theory,  on  the  other  hand,  is  concerned  with  the  characteri¬ 
zation  of  the  limiting  distribution  of  the  eigenvalues  of  random  matrices.  By  posing  the 
question  in  the  asymptotic  (with  respect  to  n)  regime  concrete  answers  can  be  obtained 
for  a  much  larger  class  of  random  matrices  than  can  handled  by  finite  random  matrix 
theory. 

The  theory  of  large  random  matrices  arises  naturally  because  the  inference  problems 
we  are  interested  in  are  inherently  high-dimensional.  In  that  regard,  a  central  object 
in  the  study  of  large  random  matrices  is  the  empirical  distribution  function  which  is 
defined,  for  an  N  x  N  matrix  A  ;y  with  real  eigenvalues,  as 


FAn(x) 


Number  of  eigenvalues  of  Ajv  5  ^ 
N 


(2*7) 


For  a  large  class  of  random  matrices,  the  empirical  distribution  function  FA/V(:r)  con¬ 
verges,  for  every  xr  almost  surely  (or  in  probability)  as  N  — +  oc  to  a  non- random 
distribution  function  F/!(x),  In  practice,  N  ^  8  is  “good  enough”  in  the  sense  that 
t  he  empirical  histogram  of  the  eigenvalues  will  very  well  approximate  the  distributional 
derivative  of  the  limiting  distribution  function.  The  early  literature  on  this  subject 
used  matrix  theoretic  arguments  to  determine  the  class  of  random  matrices  for  which 
the  limiting  eigenvalue  distribution  could  be  determined*  The  techniques  first  used  by 
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Marcei i ko-Past  u r  [59]  and  later  perfected  by  Silverstein  [S3]  and  Girko  [40]  formed  the 
foundation  of  these  investigations  (see  [8]  for  a  comprehensive  review).  Despite  being 
able  to  characterize  a  rather  broad  class  of  practically  applicable  random  matrices,  the 
derivations  had  to  be  done  on  a  case- by-case  basis  so  that  it  was  not  clear  if  there  was 
deeper  structure  in  the  random  matrices  that  could  permit  “universal13  computation. 


■  2.2.1  Beyond  special  cases:  Free  probability  theory 

The  development  of  “free  probability"  by  Voiculescu  in  the  mid- 1980’s  changed  all  t  hat 
by  pinpointing  the  structure  behind  these  high-dimensional  objects  t  hat  permits  compu¬ 
tation.  Free  probability  has  since  emerged  as  a  counterpart  to  “classical”  probability. 
Some  good  references  are  [16,  46,68,109].  These  references  and  even  I  lie  name  “fret1 
probability”  are  worthy  of  some  introduction. 

We  begin  with  a  viewpoint  on  classical  probability.  If  we  arc  given  probability 
densities  /  and  g  for  random  variables  A"  and  Y  respectively,  and  if  we  know  that  A" 
and  Y  are  independent,  we  can  compute  the  moments  of  X  -h  Y ,  and  X Y,  for  example, 
from  the  moments  of  A"  and  Y . 

Our  viewpoint  on  free  probability  is  similar.  Given  two  random  matrices,  A  and 
B  with  eigenvalue  density  functions  /  and  g,  we  would  like  to  compute  the  eigenvalue 
density  functions  for  A  +  B  and  AB  in  terms  of  the  moments  of  A  and  B. 

Of  course,  A  and  B  do  not  commute  so  we  are  in  the  realm  of  non-cominutative 
algebra.  Since  all  possible  products  of  A  and  B  are  allowed  we  have  the  “free1'  product, 
z.c.,  all  words  in  A  and  B  arc  allowed.  (We  recall  that  this  is  precisely  the  definition 
of  the  free  product  in  algebra.)  The  theory  of  free  probability  allows  us  to  compute 
the  moments  of  these  products  in  the  large  matrix  limit,  N  — *  oo  so  long  as  A 
and  B  are  (asymptotically)  free.  In  that  sense  (asymptotic)  freeness,  for  large  random 
matrices,  is  considered  the  analogue  of  independence  for  scalar  valued  random  variables. 
Remarkably,  asymptotic  freeness  results  whenever  A  (or  B)  lias  isotropically  random 
eigenvectors  so  that  they  bear  no  relationship  to  the  eigenvectors  of  B  (or  A,  reap.). 
In  other  words,  a  sufficient  condition  is  for  asymptotic  freeness  of  A  and  B  is  that  that 
the  eigenvectors  of  A  (or  B)  are  uniformly  distributed  with  Haar  measure. 

When  A  and  B  are  asymptotically  free,  the  limiting  eigenvalue  density  function 
of  A  +  B  (or  AB)  is  the  free  additive  (or  multiplicative)  convolution  of  the  limiting 
eigenvalue  density  function  of  A  and  B,  thereby  mirroring  the  structure  for  the  sums 
and  products  of  independent  scalar  valued  random  variables.  In  this  sense,  the  devel¬ 
opment  of  free  probability  theory  constitutes  a  breakthrough  in  our  understanding  of 
the  behavior  of  large  random  matrices.  Despite  this  elegant  formulation,  researchers 
were  only  able  to  use  the  underlying  free  convolution  machinery  for  concrete  compu¬ 
tations  for  some  simple  cases.  In  this  thesis,  we  solve  this  problem  by  establishing  a 
computational  free  probability  framework  (see  Section  8,4). 
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■  2.2.2  Algebraic  random  matrices  and  a  random  matrix  “calculator’1 

The  development  of  this  framework  accompanied  our  characterization  of  the  class  of 
algebraic  random  matrices  for  which  the  limiting  eigenvalue  distribution  and  the  asso¬ 
ciated  moments  can  be  concretely  computed.  The  class  of  algebraic  random  matrices 
is  defined  next. 


Definition  1  (Algebraic  random  matrices)*  Let  F4(x)  denote  the  limiting  eigen¬ 
value  distribution  function  of  a  sequence  of  random  matrices  If  a  bivariate  poly¬ 
nomial  Lmz(m,z)  exists  such  that 


Z  e  C+  \  r 


is  a  solution  of  z)  =  0  then  A/y  is  said  to  be  an  algebraic  random  matrix. 

The  density  function  f  a  —  dFA  is  referred  to  as  an  algebraic  density  and  we  say  that 
A;v  €  M& ig,  the  class  of  algebraic  random  matrices . 


The  utility  of  this,  admittedly  technical,  definition  conies  from  the  fact  that  we 
are  able  to  concretely  specify  the  generators  of  this  class.  We  illustrate  this  with  a 
simple  example.  Let  G  be  an  n  x  jn  random  matrix  with  i.ixl.  standard  normal  entries 
with  variance  1/m.  The  matrix  W(c)  =  GG'  is  the  Wishart  matrix  parameterized  by 
c  =  n/rn .  Let  A  be  an  arbitrary  algebraic  random  matrix  independent  of  W(c). 

Figure  2-2  identifies  deterministic  and  stochastic  operations  that  can  be  performed 
on  A  so  that  the  resulting  matrix  is  algebraic  as  well.  The  calculator  analogy  is  apt 
because  once  we  start  with  an  algebraic  random  matrix,  if  we  keep  pushing  away  at  the 
buttons  we  still  get  an  algebraic  random  matrix  whose  limiting  eigenvalue  distribution 
is  concretely  computable  using  the  algorithms  developed  in  Section  8. 

The  algebraicity  definition  is  important  because  everything  we  want  to  know  about 
the  limiting  eigenvalue  distribution  of  A  is  encoded  in  the  bivariate  polynomial  LAiZ{rn,  z ) 
Thus,  in  establishing  the  algebraicity  of  any  of  the  transformations  in  Figure  2-2, 
we  have  in  effect  determined  the  operational  law  for  the  polynomial  transformation 
Ljjjm,  z)  ►  L^z(mfz)  corresponding  to  the  random  matrix  transformation  A  B 

The  catalogue  of  admissible  transformations  and  their  software  realization  is  found 
in  Section  8.  This  then  allows  us  to  calculate  the  eigenvalue  distribution  functions  of  a 
large  class  of  algebraic  random  matrices  that  are  generated  from  other  algebraic  random 
matrices. 

We  illustrate  the  underlying  technique  of  mapping  canonical  operations  of  random 
matrices  into  operations  on  the  bivariate  polynomials  with  a  simple  example.  Suppose 
we  take  the  Wigner  matrix,  sampled  in  M atlas  as: 


G  -  sign(randnCK) ) /sqrt CN) ;  A  =  (G+G * ) /sqrt (2) ; 
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Deterministic 


A  +  al 

a  x  A 

j4-1 

pA  +  ql 
vA  -h  s/ 

Stochastic 

A  +  W(c) 

Wifi)  X  A 

W^c)  x  A 

(A'P  +  G) 

(Ai/2  +  GY 

Figure  2-2.  A  random  matrix  calculator  where  a  sequence  of  deterministic  and  stochastic  operations 
performed  on  an  “algebraically  character izahle”  random  matrix  sequence  AjV  produces  a  ^algebraically 
character]  zable”  random  matrix  sequence  B jv ■  The  limiting  eigenvalue  density  and  moments  of  n 
“charactemable,T  matrix  can  be  computed  numerically,  with  the  latter  often  in  closed  form. 


whose  eigenvalues  in  the  N  — +  do  limit  follow  the  semicircle  law,  and  the  Wishart  matrix 
which  may  be  sampled  in  MatLAB  as: 

G  =  randn(N>2*N)/sqrt(2*N)  ;  B  =  G*G J  ; 

whose  eigenvalues  in  the  limit  follow  the  MareenkoFastnr  law.  The  associated  limiting 
eigenvalue  distribution  functions  have  Stieltjes  transforms  m^(z)  and  mg(^)  that  are 
solutions  of  the  equations  L^z(m,  z)  —  0  and  =  0,  respectively,  where 

L^y/(m,z)  —  m2  +  zm  +  1,  L®v/(m,  z)  —  m2z  —  (—2  z  +  1)  m  +  2. 

The  sum  and  product  of  these  random  matrices  have  limiting  eigenvalue  distribution 
whose  Stieltjes  transform  is  a  solution  of  the  bivariate  polynomial  equations  z)  — 

0  and  z)  —  0,  respectively,  which  can  be  calculated  from  L^y/  and  Lfn?  alone  as 

shown  below* 

To  obtain  z)  we  apply  the  transformation  labelled  as  “Add  Atomic  Wishart” 
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in  Table  8.1  with  c  =  2,  p\  =  1  and  Ai  =  0.5  to  obtain  the  operational  law 

(2.8) 

Substituting  L^z  —  rn2  +  zm  +  1  in  (2.8)  and  clearing  the  denominator,  yields  the 
bivariate  polynomial 

££2  B(m» z)  —  +  (2  +  2)  m2  -  {-2  s  +  l)m  +  2.  (2.9) 

Similarly,  to  obtain  .  we  apply  the  transformation  labelled  as  “Multiply  Wishart1’ 
in  Table  8J  with  c  =  0.5  to  obtain  the  operational  law 

L*?{m,z)  =  L„n  ((0.5  -  0.52m)  m,  — — - - )  .  (2.10) 

\  U.5  —  (L5zm  / 

Substituting  L^z  —  m2  +  zm  +  1  in  (2.1(1)  and  clearing  the  denominator,  yields  the 
bivariate  polynomial 


L^f(mrz)  =  m4z2  —  2  ro3z  +  m2  +  4mz  +  4.  (2.11) 

Figure  2-3  plots  the  density  function  associated  with  the  limiting  eigenvalue  distri¬ 
bution  for  the  Wigner  and  Wishart  matrices  as  well  as  their  sum  and  product  extracted 
directly  from  z)  and  L;^f(m,z). 

In  t  his  simple  case,  the  polynomials  were  obtained  by  hand  calculation.  Along  with 
the  theory  of  algebraic  random  matrices  we  also  develop  a  software  realization  that 
maps  the  entire  catalog  of  transformations  (see  Tables  8.1  -8.3)  into  symbolic  Matlab 
code.  Thus,  for  the  example  considered,  the  sequence  of  commands: 

»  syms  m  z 
»  LmzA  =  m"2+z*m+l; 

»  LmzB  =  nT2- (-2+z+l) *m+2 ; 

»  LmzApB  =  AplusB (LmzA , LmzB) ; 

»  LrnzAtB  =  AtimesB (LmzA , LmzB) ; 

could  also  have  been  used  to  obtain  L^B  and  Li4B .  The  commands  AplusB  and 
AtimesB  implicitly  use  the  free  convolution  machinery  to  perform  the  said  computation. 

To  summarize,  by  defining  the  class  of  algebraic  random  matrices,  we  are  able  to 
extend  the  reach  of  infinite  random  matrix  theory  well  beyond  the  special  cases  of 
matrices  with  Gaussian  entries.  The  key  idea  is  that  by  encoding  probability  densities 
as  solutions  of  bivariate  polynomial  equations,  and  deriving  the  correct  operational  laws 
on  this  encoding,  we  can  take  advantage  of  powerful  symbolic  and  numerical  techniques 
to  compute  these  densities  and  their  associated  moments.  In  particular,  for  the  examples 
considered,  algebraically  extracting  the  roots  of  these  polynomials  using  the  cubic  or 
quartic  formulas  would  be  of  little  use.  Consequently,  looking  for  special  cases  where 
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the  limiting  density  function  can  be  written  in  closed  form  is  needlessly  restrictive  unless 
one  is  attempting  to  classify  these  special  random  matrix  ensembles. 

The  statistical  techniques  developed  in  this  thesis,  do  not.  however,  exploit  the  full 
scope  of  this  method  which  is  developed  in  Chapters  6-10.  The  possibility  of  being  able 
to  characterize  matrices  more  complicated  than  those  formed  from  entries  with  Gaussian 
elements  makes  it  possible  to  start  thinking  about  formulating  inference  procedures  for 
these  more  complicated  random  matrix  models.  In  this  thesis,  we  leave  the  theory  at 
that*  While  we  illustrate  the  power  of  the  method  with  some  examples,  we  leave  it 
to  practitioners  to  motivate  additional  applications  that  exploit  the  full  power  of  the 
stochastic:  eigen-analysis  techniques  developed. 

■  2.3  Contributions  of  this  thesis 

As  an  addition  to  the  table  of  contents,  we  now  itemize  the  results  we  consider  most 
important  and  where  they  can  be  found.  We  remark  that  all  statements  labelled  as 
theorems  represent,  we  believe,  new  results,  while  important  results  from  the  literature 
are  labelled  as  propositions  or  lemmas.  Chapters  3-5  are  self-contained  and  can  hr  read 
independently.  Chapters  6  -  10  describe  the  “polynomial  method”  for  characterizing  a 
broad  class  of  random  matrices  and  may  be  read  separately  from  the  preceding  material. 
Where  appropriate,  every  chapter  contains  a  section  on  future  work  and  ot  her  directions 
of  research.  The  thesis  contributions  are: 

•  New  algorithm  for  detecting  number  of  signals  in  white  noise  from  the  sample 
eigenvalues  alone  that  dramatically  outperforms  the  Kailath-Wax  estimator,  (see 
Chapter  3  for  details  and  Table  3.1  for  the  algorithm).  This  solves  a  long-standing 
open  problem  in  sensor  array  processing. 

*  Heuristic  explanation  of  the  phase  transition  phenomenon  for  largest  (“signal”) 
eigenvalues.  This  establishes  a  fundamental  limit  in  detection  using  the  signal 
eigenvalues  alone.  Roughly  speaking,  for  large  number  of  sensors  and  snapshots, 
the  signals  can  be  reliably  detected  using  the  method  developed  if 


Signal  Power  >  Noise  Power 


Consistency  of  the  estimators  in  Table  3.1  with  respect  to  to  the  concept  of  effec¬ 
tive  number  of  signals  is  discussed  in  Section  3.7. 

■  New  eigen- inference  techniques  for  testing  equality  of  population  eigenvalues  and 
parametrically  estimating  population  eigenvalues  arc  presented  in  Chapter  4, 
These  techniques  supplant  Anderson's  techniques  for  high-dimensional,  sample 
size  constrained  settings. 
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(a)  The  limiting  eigenvalue  density  function  for  the  GOE  and 
Wish  art  matrices. 


(b)  The  limiting  eigenvalue  density  function  for  the  sum  and 
product  of  independent  GOE  and  Wishart  matrices. 

Figure  2-3,  A  representative  computation  using  the  random  matrix  calculator. 
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•  We  provide  an  approximation  for  the  distribution  of  the  outputs  of  the  diagonally 
loaded  Capon-MVDR  beamformer  in  Chapter  5  solving  an  open  problem  posed 
by  Baggeroer  and  Cox  in  [7]. 

•  We  describe  a  class  of  “algebraic”  random  matrices  (see  Chapter  7).  These  are 
random  matrices  for  which  the  Stieltjes  transform  of  the  limiting  eigenvalue  dis¬ 
tribution  function  is  an  algebraic  function.  Wish-art  matrices  with  identity  covari¬ 
ance  are  a  special  case.  The  practical  utility  of  this  definition  is  that  if  a  random 
matrix  is  shown  to  be  algebraic  then  its  limiting  eigenvalue  density  function  ran 
be  computed  using  a  simple  root- finding  algorithm.  Furthermore,  if  the  moments 
exist,  then  we  will  often  be  able  to  enumerate  them  efficiently  in  dosed  form.  By 
specifying  the  class  of  such  random  matrices  by  its  generators  we  solve  an  open 
problem  in  computational  random  matrix  theory  by  extending  the  reach  of  the 
theory  to  concretely  predict  the  limiting  distribution  of  a  much  broader  class  of 
random  matrices  than  thought  possible. 

•  We  describe  the  computation  of  the  Markov  transition  kernel  for  certain  classes  of 
algebraic  random  matrices.  The  Markov  transition  kernel  encodes  the  conditional 
“eigenvector  distribution”  (see  Chapter  10  for  a  precise  description)  of  algebraic 
random  matrices.  The  computation  facilitates  analysis,  for  the  first  time,  of  the 
eigenvectors  of  a  broad  subclass  of  algebraic  random  matrices  including  those  with 
Wishart  distribution 


Chapter  3 


Statistical  eigen-inference: 
Signals  in  white  noise 


■  3.1  Introduction 

The  observation  vector,  in  many  signal  processing  applications,  can  be  modelled  as  a 
superposition  of  a  finite  number  of  signals  embedded  in  additive  noise.  Detec  ting  the 
number  of  signals  present  becomes  a  key  issue  and  is  often  the  starting  point  for  the 
signal  parameter  estimation  problem.  When  the  signals  and  the  noise  are  assumed  to  be 
samples  of  a  stationary,  ergot  lie  Gaussian  vector  process,  the  sample  covariance  matrix 
formed  from  m  observations  has  the  Wishart  distribution  [114],  The  proposed  algo¬ 
rithm  uses  an  information  theoretic  approach  for  determining  the  number  of  signals  in 
white  noise  by  examining  the  eigenvalues  of  the  resulting  sample  covariance  matrix.  An 
essential  component  of  the  proposed  estimator  is  its  explicit  dependence  on  the  dimen¬ 
sionality  of  the  observation  vector  and  the  number  of  samples  used  to  form  the  sample 
covariance  matrix.  This  makes  the  proposed  estimator  robust  to  high-dimensionality 
and  sample  size  constraints. 

The  form  of  the  estimator  is  motivated  by  results  on  the  eigenvalues  of  large  di¬ 
mensional  sample  covariance  matrices  [10-12,32,49,51,70].  We  are  able  to  re- derive  a 
portion  of  these  results  [11,12,70],  reported  by  other  authors  in  the  literature,  using  an 
interacting  particle  system  interpretation,  thereby  providing  insight  into  the  structure  of 
the  proposed  solut  ion  and  its  shortcomings.  The  concept  of  effective  number  of  signals 
is  introduced  (see  Section  3.7),  which  depends  in  a  simple  manner  on  the  noise  variance, 
sample  size  and  dimensionality  of  the  system.  This  concept  captures  the  fundamental 
limits  of  sample  eigenvalue  based  detection  by  explaining  why,  asymptotically,  if  the 
signal  level  is  below  a  threshold  that  depends  on  the  noise  variance,  sample  size  and  the 
dimensionality  of  the  system,  then  reliable  detection  is  not  possible.  More  importantly, 
the  proposed  estimators  dramatically  outperforms  the  standard  estimators  found  in  the 
literature,  particularly  so  in  sample  starved  settings.  While  such  a  behavior  is  to  be 
expected  when  the  dimensionality  of  the  system  is  large  because  of  the  nature  of  the 
random  matrix  results  being  exploited,  this  trend  is  observed  in  smaller  dimensional 
settings  as  well. 

This  chapter  is  organized  as  follows.  The  problem  formulation  in  Section  3.2  is 
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followed  by  an  exposition  in  Section  3.3,  using  an  interacting  particle  system  inter¬ 
pretation,  of  the  properties  of  t  he  eigenvalues  of  large  dimensional  sample  covariance 
matrices  when  there  are  no  signals  present.  The  analysis  is  extended  in  Section  3.4  to  the 
case  when  there  are  signals  present.  The  occurrence  of  a  phase  transition  phenomenon 
in  the  idenf inability  of  the  largest  (“signal”)  eigenvalue  is  heuristically  described  and 
re-derived  using  an  interacting  particle  system  interpretation.  An  estimator  for  the 
number  of  signals  present  that  exploits  these  results  is  derived  in  Section  3.5.  An  ex¬ 
tension  of  these  results  to  the  frequency  domain  is  discussed  in  Section  3.6*  Consistency 
of  the  proposed  estimators  and  the  concept  of  effective  number  of  signals  is  discussed 
in  Section  3.7.  Simulation  results  that  illustrate  the  superior  performance  of  the  new 
method  in  high  dimensional,  sample  size  starved  settings  are  presented  in  Section  3.8: 
some  concluding  remarks  are  presented  in  Section  3.9. 


■  3.2  Problem  formulation 


We  observe  m  samples  (“snapshots15)  of  possibly  signal  bearing  n-dimensional  snapshot 
vectors  yi, . .  .,ym  where  for  each  i ,  x*  ^  A/i,(0.R)  and  x,  are  mutually  independent. 
The  snapshot  vectors  are  modelled  as 


z, 

A  Si  + 


No  Signal 
Signal  Present 


for  i  —  1 ,  - .  - ,  m, 


(3-1) 


where  z?  ^  *7"1),  denotes  an  n-dimensional  (real  or  complex)  Gaussian  noise 

vector  where  a2  is  generically  unknown,  s,  ^  A4(0.I).  ^  A4(0,  Rs)  denotes  a  k- 

dimensional  (real  or  complex)  Gaussian  signal  vector  with  covariance  Rst  and  A  is  a 
n  x  k  unknown  non-random  matrix.  In  array  processing  applications,  the  jwth  column 
of  the  matrix  A  encodes  the  parameter  vector  associated  with  the  j-tli  signal  whose 
magnitude  is  described  by  the  j- the  element  of  s?. 

Since  the  signal  and  noise  vectors  are  independent  of  each  other,  the  covariance 
matrix  of  xt  can  lienee  he  decomposed  as 


R.  -  *  +  a2 1 


(3.2) 


where 

V  =  ARflA',  (3.3) 

with  '  denoting  the  conjugate  transpose.  Assuming  that  the  matrix  A  is  of  full  column 
rank,  he.,  the  columns  of  A  are  linearly  independent,  and  that  the  covariance  matrix 
of  the  signals  Rs  is  nonsingular,  it  follows  that  the  rank  of  4*  is  k ,  Equivalently,  the 
it  —  k  smallest  eigenvalues  of  sP  are  equal  to  zero. 

If  we  denote  the  eigenvalues  of  R  by  Aj  >  A2  >  « . .  >  A n  then  it  follows  that  the 
smallest  n  —  k  eigenvalues  of  R  are  all  equal  to  a2  so  that 


A^+i  —  Ajt-^2  An  —  A  —  17". 


(3.4) 
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Thus,  if  the  true  covariance  matrix  R  were  known  apriori,  the  dimension  of  the  signal 
vector  k  can  be  determined  from  the  multiplicity  of  the  smallest  eigenvalue  of  R,  When 
there  is  no  signal  present,  all  the  eigenvalues  of  R  will  be  identical.  The  problem  in 
practice  is  that  the  covariance  matrix  R  is  unknown  so  that  such  a  straight-forward 
algorithm  cannot  be  used.  The  signal  detection  and  estimation  problem  is  hence  posed 
in  terms  of  an  inference  problem  on  m  samples  of  7i-dimensional  multivariate  real  or 
complex  Gaussian  snapshot  vectors. 

Inferring  the  number  of  signals  from  these  m  samples  reduces  the  signal  detection 
problem  to  a  model  selection  problem  for  which  there  are  many  approaches.  A  classical 
approach  to  this  problem,  developed  by  Bartlett  [13]  and  Lawlev  [54],  uses  a  sequence 
of  hypothesis  tests.  Though  this  approach  is  sophisticated,  the  main  problem  is  the 
subjective  judgement  needed  by  the  practitioner  in  selecting  the  threshold  levels  for  the 
different  tests. 

Information  theoretic  criteria  for  model  selection  such  as  those  developed  by  Akaike 
[1,2],  Schwartz  [80]  and  Rissanen  [7b]  address  this  problem  by  proposing  the  selection  of 
the  model  which  gives  the  minimum  information  criteria .  The  criteria  for  the  various 
approaches  is  generically  a  function  of  the  log-likelihood  of  the  maximum  likelihood 
estimator  of  the  parameters  of  the  model  and  a  term  which  depends  on  the  number  of 
parameters  of  the  model  that  penalizes  overfitting  of  the  model  order. 

For  the  problem  formulated  above,  Wax  and  Kailath  propose  an  estimator  [111]  for 
the  number  of  signals  (assuming  m  >  n)  based  on  the  eigenvalues  i]  >  h  ^  *  -  *  >  ln  of 
the  sample  covariance  matrix  (SGM)  defined  by 


-  1  *  1 
R  -  —  )  x*x'  -  —XXf 

rn  m 

i=  1 


(3.5) 


where  X  =  [X]  [ . . .  |xm]  is  the  matrix  of  observations  (samples).  The  Akaike  Information 
Criteria  (AIC)  form  of  the  estimator  is  given  by 


g(k) 

kAK-  =  arg  min  -2(n  -  k)m.  log  — — 
*6lN:0<*<n  a\k) 


+  2 A’ (271  -  fc) 


(3.6) 


while  the  Minimum  Descriptive  Length  (MDL)  criterion  is  given  by 

kuoL  =  arg  min  -{n  -  k)m  log  -f  ~k(2n  -  k)  log  m 

fceN:0<Jt<n  aW  1 


(3.7) 


where  g(k)  =  n^W+i  if"  ^  is  the  geometric  mean  of  the  n  —  k  smallest  sample 
eigenvalues  anti  a (k)  —  Y^j-k+i  h  *s  arithmetic  mean. 

These  estimators  perform  adequately  only  when  the  sample  size  greatly  exceeds  the 
dimension  of  the  system  by  a  factor  of  15  —  100.  While  their  large  sample  consistency 
have  been  analytically  established,  these  results  do  not.  lend  any  insight  into  the  short¬ 
comings  iti  situations  where  the  dimensionality  of  the  system  is  large  or  the  number  of 
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(a)  True  Eigenvalues. 


(b)  Sample  eigenvalues  (m  =  n  =  20), 


Figure  3-1.  Blurring  of  sample  eigenvalues  due  to  finite  number  of  snapshots. 


samples  size  is  on  t  he  order  of  the  dimensionality  or  both,  as  is  increasingly  the  case* 
in  many  signal  processing  and  scientific  applications.  The  reason  why  these  estimators 
perform  so  poorly  while  the  proposed  estimators,  summarized  in  Table  3.1,  perforin  so 
well  in  these  settings  is  best  illustrated  by  an  example. 

Figure  3-1  compares  the  20  eigenvalues  of  the  true  covariance  matrix  (with  noise 
variance  equal  to  1)  with  the  eigenvalues  of  a  single  sample  covariance  matrix  formed 
from  20  snapshots.  The  three  “signal”  eigenvalues  can  be  readily  distinguished  in  t  he 
true  covariance  eigen-spectrum;  the  distinction  is  less  clear  in  the  sample  eigen-spectrum 
because  of  the  significant  blurring  of  the  signal  and  noise  eigenvalues. 

Traditional  estimators,  including  the  Wax-Kailath  algorithm,  perform  poorly  in 
high-dimensional,  sample  size  constrained  settings  because  they  do  not  account  for  this 
blurring;  the  proposed  estimators  are  able  to  overcome  this  limitation  by  explicitly 
exploiting  analytical  results  that  capture  the  dependence  of  the  blurring  on  the  noise 
variance,  the  sample  size  and  the  dimensionality  of  the  system.  The  applicability  of 
the  algorithms  in  scenarios  where  the  sample  size  is  less  than  the  dimensionality  of  the 
system  is  a  feature  that  makes  it  suitable  for  sensor  array  processing  and  ot  her  emerging 
applications  in  science  and  finance  where  such  situations  are  routinely  encountered. 

Furthermore,  the  analytical  results  provide  insight  into  the  fundamental  limit,  due 
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to  sample  size  constraints  in  high-dimensional  settings,  of  reliable  signal  detection  by 
eigen- inference,  ie.;  by  using  the  sample  eigenvalues  alone  (see  Section  3*7),  This  helps 
identify  scenarios  where  algorithms  that  exploit  any  structure  in  the  eigenvectors  of 
the  signals,  such  as  the  MUSIC  [79]  and  the  Capon-MVDR  [19]  algorithms  in  sen¬ 
sor  array  processing,  might  be  better  able  to  tease  out,  lower  level  signals  from  the 
background  noise*  It  is  worth  noting  that  the  proposed  approach  remains  relevant  in 
situations  where  the  eigenvector  structure  has  been  identified.  This  is  because  eigen- 
inference  methodologies  are  inherently  robust  to  eigenvector  modelling  errors  that  occur 
in  high-dimensional  settings*  Thus  the  practitioner  may  use  the  proposed  methodolo¬ 
gies  to  complement  and  “robustify”  the  inference  provided  by  algorithms  that  exploit 
the  eigenvector  structure. 


■  3.3  Eigenvalues  of  the  (null)  Wishart  matrix 

When  there  are  no  signals  present,  R  —  A I  so  that  the  SCM  R  is  sampled  from  the 
(null)  Wishart  distribution  [114].  The  joint  density  function  of  the  eigenvalues  /j , . . . ,  lT] 
of  R  when  m  >  n  +  1  is  given  by  [fi7] 


Zlm  exp 


^  i=  1  /  i-l 


n  i«<  - 


!<j 


(3.8) 


where  / 1  >...>/„>  0  ,  is  the  normalization  constant  and  (i  =  1  (or  2)  when 
R.  is  real  (or  complex).  Taking  the  negative  logarithm  of  the  joint  density  function  in 
(11.8)  and  defining  £  =  {l\ . ln)  gives  us  the  negative  log-likelihood  function 

W):=-log 2 -  (a(m  ~n+1)  -  1 )  E  l°sl<+4?  E h-at,  ft-t I-  (3.0) 

^  *  i—  1  t-1  %<j 


■  3.3.1  An  interacting  particle  system  interpretation 

Let.  the  sample  eigenvalues  Jn  represent  locations  of  particles,  then  (3.9)  can 
be  interpreted  in  statistical  physics  terms,  as  the  logarithmic  energy  of  this  system  of 
particles.  Note  that  we  constrain  the  particles  to  lie  along  the  positive  real  axis  so  that 
k  >  0. 

The  configuration  of  the  particles  that  minimizes  the  logarithmic  energy  (assuming 
a  unique  minimum  exists)  is  simply  the  maximum  likelihood  estimate  of  the  sample 
eigenvalues.  For  the  system  represented  in  (3.9),  it  turns  out  that  a  unique  minimum 
exists  so  we  can  proceed  with  trying  to  qualitatively  predict  the  equilibrium  configura- 
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I  II  III 


(a)  Non-equilibrium  configuration  of  particles  (sample  eigenvalues). 


I  II  III 

(b)  Equilibrium  configuration  of  particles  (sample  eigenvalues). 


Figure  3-2,  interacting  particle  system  interpretation  of  the  eigenvalues  of  a  (null)  Wishart  mat  rix. 


tion  of  the  particles.  Consider  the  rescaled  (by  \{n2)  logarithmic  energy  given  by 


constant 


1  V-  &  1  ^ 

-  ft-  Y.  1(>g  5T-  -  5>  •  3.10) 

7i.  J  2 Xcm  n  ^ 

i<j  * 


m 


with  Cm  =  ti/rn  <  1.  The  equilibrium  position  of  the  particles  is  the  configuration  that 
minimizes  the  logarithmic  energy  of  the  system  given  by  (3.10)  subject  to  the  forces 
identified  by  the  roman  numerals.  This  involves  balancing  the  three  competing  “forces’1 
depicted  in  Figure  3-2,  If  the  particles  are  placed  in  some  arbitrary  position  as  in  Figure 
3-2{a),  they  will  be  subjected  to  the  competing  forces  described  below*  internet  with 
each  other  and  eventually  reach  an  equilibrium  configuration  as  in  Figure  3-2(b).  The 
term 


r‘:=-( 


—  - 1 


ft -2 
2  n 


)i£iog<(»-^(±-i)i)rioe/,  pun 


represents  a  repulsion  from  the  origin  that  is  minimized  when  the  particles  are  further 
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away  from  the  origin,  i.e.,  for  larger  values  of  lj.  The  term 


If  ; 


i<j 


(3-12) 


represents  an  inter-particle  repulsion  that  is  minimized  when  the  particles  (sample 
eigenvalues)  are  spaced  out  as  far  apart  as  possible  so  that  the  difference  —  /j|  is 
large.  Finally,  the  term 


(3J3) 


represents  an  attraction  to  the  origin  that  is  minimized  when  the  particles  are  closer  to 
the  origin,  i.e..  for  small  Gencrically  speaking,  for  arbitrary  cm  <  1  (and  large  m,  n), 
since  log U  <  ij,  comparing  (3.11)  and  (3.13),  the  particles  experience  an  attraction  to 
the  origin  that  is  greater  than  the  repulsion  away  from  the  origin.  Thus  wo  can  expect 
the  sample  eigenvalues  to  be  distributed  about  x  =  A  with  a  greater  concentration 
closer  towards  the  origin  as  depicted  in  Figure  3-2(b). 

■  3.3.2  Sample  eigenvalues  in  the  snapshot  abundant  regime 

Continuing  this  physical  analogy  further,  observe  that  in  (3.10)  the  ratio  cm  —  n/m  <  1 
does  not  affect  the  (internal)  repulsion  between  the  particles  (sample  eigenvalues) . 
Thus,  for  a  fixed  choice  of  A,  the  value  of  Cm  affects  the  equilibrium  position  by  gov¬ 
erning  the  manner  in  which  the  repulsion  between  the  particles  (TV)  is  balanced  by  the 
repulsion /at  traction  of  the  origin  (T\  and  1 3  respectively). 

In  the  snapshot  abundant  regime,  where  the  number  of  snapshots  is  significantly 
greater  than  the  dimensionality  of  the  system,  we  obtain  values  of  cm  =  n/m  very  close 
to  zero.  Thus,  since  m  n,  l/cm  1  so  that  the  interaction  between  the  particles 
can  be  neglected.  In  other  words,  the  equilibrium  configuration  minimizes 


so  that  the  equilibrium  configuration  of  the  i-th  particle  is  determined  by  the  condition 


This  is  equivalent  to  the  condition 


resulting  in  l7  =  A  for  both  0  —  1  (real)  or  0  =  2  (complex),  as  expected.  In  other 
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Figure  3-3.  Limiting  distribution  of  the  eigenvalues  of  the  {null)  Wishart  matrix. 


words,  as  rm  — *  0,  lt  ■— »  A  for  i  =  1, . . . ,  n  so  that  the  eigenvalues  of  the  SCM  converge 
to  the  (single)  eigenvalue  of  the  true  covariance  matrix. 

■  3.3.3  Limiting  distribution  of  the  sample  eigenvalues 

Generic *ally  speaking,  for  arbitrary  values  of  em  —  n/rn7  the  limiting  distribution  of 
the  sample  eigenvalues  is  influenced  by  all  of  the  forces  depicted  in  Figure  3-2.  The 
limiting  distribution  exists  and  can  be  analytically  computed.  Define  the  empirical 
distribution  function  (e.d.f.)  of  the  eigenvalues  of  an  n  x  n  self-adjoint  matrix  ATF  with 
n  real  eigenvalues  (counted  with  multiplicity)  as 

pAti  (  j  _  Number  of  eigenvalues  of  A„  <  x  ^  ^ 


Proposition  3.31.  Let  R  denote  a  sample  covariance  matrix  formed  from  an  u  x  m 
matrix  of  observations  with  i.i.d.  Gaussian  samples  of  mean  zero  and  variance  A,  Then 
the  e*dj.  FR  — *  Fu  almost  surely,  as  nun  — >  oo  and  cm  —  nfm  —*■  c  where  Fu  is  a 
non-random  distribution  function  with  density 

flv(x)  :=  dFw(x)  =  min  (o,  (l  -  ^  S(x)  +  ^  ~  ~~  H[«_,a+|(*)  {3.15} 

with  a±  =  A(1  d=  y/c)2,  H(a,fc](£)  =  1  when  a  <  X  <  b  and  zero  otherwise ,  and  <5(a:)  is  the 
Dirac  delta  function. 
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Proof.  This  result  was  proved  in  [59,  110]  in  very  general  settings.  Other  proofs 
include  [51,86,87].  The  distribution  may  also  be  obtained  by  first  determining  the 
(non-random)  equilibrium  positions  of  the  71  particles  and  then  determining  the  e.d.f. 
in  the  large  n,m  limit.  These  positions  will  precisely  coincide  with  the  appropriately 
normalized  zeros  of  the  rc-th  degree  Laguerre  polynomial.  A  proof  of  this  fact  follows 
readily  from  Szego’s  exposition  on  orthogonal  polynomials  [99]  once  the  correspondence 
between  the  (null)  Wishart  distribution  and  the  Laguerre  orthogonal  polynomial  is 
recognized  as  in  [34], 

The  approach  taken  in  [45]  explicitly  relies  on  the  interacting  particle  system  inter¬ 
pretation  by  showing  that  the  density  //  =  //  w  is  the  unique  minimizer  of  the  functional 
obtained  from  the  limit  of  (3.10) 

V(fi)  ■—  Constant  —  —  J  Q(x)fi(x)dx  —  0  JJ  log  —  y\n(x)fi(y)dxdy,  (3.16) 
where  Q(ar)  =  (1/r  —  1)  log  a’  —  x/Xc,  rn.n  — *  oo  with  m/n  — *  c  €  (0, 1].  D 

The  density  fiv(z),  with  X  =  1,  is  shown  in  Figure  3-3  for  different  values  of  c  €  (0. 1] 
confirming  our  qualitative  prediction  about  its  relative  skewing  towards  the  origin  for 
moderate  values  of  c  and  a  localization  about  X  =  1  for  values  of  c  close  to  zero. 


■  3.3.4  Gaussian  fluctuations  of  sample  eigenvalues 

The  almost  sure  convergence  of  the  e.d.f.  of  the  (null)  Wishart  matrix  implies  that  for 
any  “well-behaved"  function  h, 

I  f 

-  -  /  h&)dFw(*).  (3.17) 

71  i=l  ^ 

where  the  convergence  in  the  above  is  almost  surely.  In  particular,  when  h  is  a  mono¬ 
mial,  we  obtain  the  moments  associated  with  the  density  function 


We  can  take  this  a  step  further  by  examining  the  fluctuations  about  these  limiting 
results.  Precisely  speaking,  for  h  a  monomial  as  above  (or  more  generally),  once  we 
subtract  the  expected  average  over  the  limiting  eigenvalue  density,  ie.,  the  right  hand 
side  of  (3.17),  the  rescaled  resulting  quantity  tends  asymptotically  to  a  normal  distri¬ 
bution  with  mean  and  variance  depending  on  /*. 

Proposition  3.32,  If  R  satisfies  the  hypotheses  of  Proposition  3,31  with  X  —  1  then 
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as  m.  n  — *  oc  and  ctn  —  n/m  — »  c  G  (0,  oc),  then 

r  U  ~  n 

1  = 

where  the.  convergence  in  distribution  is  almost  surely,  0  —  1  (or  2)  when  x,  is  iral  (or 
complex)  valued,  respectively,  and. 


K 


JV 


(o, 


0 


(3.19) 


Q  = 


C 

2c(c  +  l) 


2c (c  +  1) 

2c  (2r2  +  5c  +  2) 


{3.20) 


Proof,  This  result  appears  in  [49,51]  for  the  real  case  and  in  [10]  for  the  real  and 
complex  cases.  The  result  for  general  0  appears  in  Dumitriu  and  Edelman  (32] *  □ 
It  is  worth  remarking  that  while  the  limiting  density  of  the  SCM  does  not  depend 
on  whether  the  elements  of  the  observation  (snapshot)  vectors  are  real  or  complex,  the 
mean  and  variance  of  t  he  fluctuations  do.  The  Gaussian lty  of  the  eigenvalue  fluctuations 
is  consistent  with  our  association  of  the  limiting  density  with  the  maximum  likelihood 
equilibrium  configuration  of  the  interacting  particle  system.  The  asymmetric  interaction 
between  the  largest  eigenvalue  and  the  “bulk'1’  of  the  eigen-spectrum  accounts  for  the 
non-Gaussianity  of  the  fluctuations  of  the  largest  eigenvalue  which  follow  the  Tracy- 
Widom  distribution  [50]. 


■  3.4  Signals  in  white  noise 

For  arbitrary  covariance  R  the  joint  density  function  of  the  eigenvalues  ij, . . .  ,/n  of  the 
SCM  R  when  rn  >  n  +  1  is  given  by 

Zlm  E  J]  \li  -  lj\ jf  exp  (-^Tr  (A~]QLQ'))  dQ  (3.21) 

where  l[  >  (),  Zfljn  is  a  normalization  constant,  and  3  =  I  (or  2)  when  R  is 

real  (resp.  complex).  In  (3.21),  A  —  diagfAj, . . . .  An),  L  —  diag(/j _ , /„),  Q  6  O(u) 

when  0  —  I  while  Q  6  U(n)  when  0  —  2  where  O (a)  and  U(u)  are,  respectively,  the 
set.  of  n  x  n  orthogonal  and  unitary  matrices  with  Haar  measure.  The  Haar  measure  is 
the  unique  uniform  measure  on  orthogonal/ unitary  matrices;  see  Chapter  1  of  Milman 
and  Schechtanan  for  a  derivation  [62]. 

It  can  be  readily  seen  that  when  R  =  A  =  A 1  so  that 

/Q«  P  (-f  Tr  (A-QLQ'))  dQ  =  exp  . 

the  joint  density  in  (3.21)  reduces  to  (3.8)  when  ZnJn  and  Zn,m  axe  appropriately 
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defined. 

In  the  general  case,  for  arbitrary  R,  the  expression  for  the  joint  density  In  (3.21) 
is  difficult  to  analyze  because  of  the  integral  over  the  orthogonal  {or  unitary)  group. 
For  the  problem  we  are  interested  in,  when  there  are  k  signals  in  white  noise  and 
R  =  diag(Ai, .  ■ . ,  A*,  A, . . . ,  A),  an  examination  of  the  large  m  approximation  of  this 
integral  can  give  us  additional  sights. 

For  this  purpose,  it  suffices  to  examine  the  scenario  when  there  is  a  single  real- valued 
signal  in  white  noise,  i.e.,  k  =  1  and  0  =  1,  for  which  we  may  employ  the  approximation 
stated  in  [67] 

J  exp  ^-^Tr(A''QLQ')^  HQ 

w  c".mexp  exp  ft  I'1  “  b\~i/2  (3'22> 

with  Cn,m  being  a  normalization  constant  so  that  (3.21 )  may  be  approximated  by 


r-u  TT  i(m“n“1)/^ 

X  l  '  i 

i-2  Ki<j 


II  \li-h lexP  (“^^0  x 

<i<j  \  i~2  / 


t(m— n— 1)/2 
l\ 


(3'23) 


^ph{l»|b . in) 


Note  that  the  approximated  joint  density  in  (3.23)  Inis  been  decomposed  as  shown,  into 
the  product  of  the  joint  density,  LBn\k(hf  ■  ■  •  ^n)i  of  the  “noise”  eigenvalues  and  the 
conditional  density,  Lgpk{/]  |/a. . . . ,  /„)  of  the  largest  (signal)  eigenvalue  where  C®m  and 

C‘n_m  are  normalization  constants. 


■  3.4.1  Interacting  particle  system  interpretation 

As  before,  let  the  sample  eigenvalues  /],...,  Zn  represent  locations  of  particles.  Thus,  the 
rescaled  (by  1/n2}  negative  log- likelihood  of  the  joint  density  function  is  interpreted  as 
the  logarithmic  energy  of  the  particle  system  whose  ti  particles  arc  located  at  positions 
/ 1 , . . .  From  (3.23),  the  logarithmic  energy  may  be  approximated  as 

V(i)  =  ~ log  Cn.m+VN(h,...Jm)  +  -Vs(U\l2 . In)  (3.24) 

rr  n 


Constant 
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III 


3  2 


1 


II 


Figure  3-4.  Interactions  between  the  “signal”  eigenvalue  and  the  “noise”  eigenvalues. 

where  the  contributions  due  to  the  particles  that  represent  the  noise  and  signal  eigen¬ 
values  are  respectively  given  by 


cm  1  71 


nr 


ii 


and 


-Signal 


II -Signal  III -Signal 


with  Cm  =  n/m  <  1.  The  equilibrium  configuration  of  the  particles  minimizes  the 
logarithmic  energy  The  decomposition  of  the  logarithmic  energy  as  in  (3.24) 

hints  at  the  possibility  of  predicting  the  resulting  configuration  by  using  a  two  step 
approach.  Specifically,  for  large  enough  n  so  that  Vn  ^  (l/n)Vs»  the  configuration  of 

the  n  —  1  (“noise”)  particles,  /2 _ Jn,  that  minimizes  _ /n)  should  be  a  very 

good  approximation  of  the  configuration  that  minimizes  V(£). 

■  3.4,2  Repulsion  and  phase  transition  of  largest  (“signal")  eigenvalue 

Conditioned  on  the  resulting  configuration  of  the  n  —  1  noise  particles,  the  configura¬ 
tion  of  the  n-th  particle  minimizes  V$(l\ |/2>  - . . ,  lH)-  The  underbraced  terms  in  (3.25b) 
represent  the  forces  that  the  n-th  particle  is  subjected  to.  They  denote,  respectively,  a 
force  of  repulsion  away  from  the  origin,  a  force  of  repulsion  away  from  the  n  —  1  noise 
particles,  and  a  force  of  attraction  towards  the  origin  as  depicted  in  Figure  3-4.  The 
equilibrium  configuration  of  the  particle  is  determined  by  the  condition 


(3.26) 


which  for  large  n,  from  (3.25b),  reduces  to 


(3.27) 
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In  the  large  n,m  limit,  comparing  (3.10)  and  (3.25a),  it  is  clear  that  the  equilibrium 
configuration  of  the  n  —  1  "noise”  particles  will  be  very  well  approximated  by  the 
configuration  that  results  in  the  no  signal  case.  Asymptotically,  it  is  reasonable  to 
replace  the  discrete  condition  for  the  minimization  configuration  in  (3.25b)  with  its 
continuous  limit  so  that  equilibrium  location  of  I]  satisfies  the  equation 


1 

2 


1  1 

Ti  +  2c\\ 


-  f-L 

2  Jh- 


-<1FW  =  0 


(3.28) 


where  (x)  is  the  Marcenko-Pastur  distribution  in  (3.15)  and  cm  =  n/m  — *  c  <  1  as 
n,ro  — *  oo.  The  Cauchy  transform  of  the  distribution  function,  FA,  is  defined  as 


for  z  €  C+  \  K. 


(3.29) 


Thus  the  underbraced  term  in  (3.28)  is  the  Cauchy  transform  of  the  distribution  fund  ion 
F"’(a:)  evaluated  at  z  =  l\  so  that  givfM  represents  the  effective  repulsive  force  acting 
on  the  “signal"  particle  due  to  the  “noise”  particles.  It  can  be  seen  from  the  definition 
of  the  Cauchy  transform  itself  that  g(z)  ~  Ijz  for  large  z  — *  oo  so  that  the  effective 
repulsion  felt  by  the  n-th  particle  decreases  the  further  away  it  is  from  the  remaining 
u  —  1  particles.  The  equilibrium  configuration  of  the  n-the  particle  is  thus  given  by  the 
force  balancing  condition 


1 

2 


1  1 
li  +  2c  A  i 


=  0 


where,  using  the  result  stated  in  Proposition  3.31,  it  can  be  shown  that 


9w{z) 


—  A  +  Ac+z  —  \/Aa  -  2 A2c  —  2  zA  +  A'2c2  —  2 czA  +  z2 

2czA 


(3.30) 


(3-31) 


defined  for  all  z  £  [A{1  -  v'c)2-  A{1  +  v/^)2]  for  c  €  (0.  oo)  and  z  ^  0  for  c  >  1.  Solving 
(3.30)  gives  us 


(3.32) 


To  determine  if  this  value  of  l\  does  indeed  correspond  to  the  minimum,  wc  need  to 
evaluate  the  derivative  of  the  left  hand  side  of  (3.30)  with  respect  to  /]  at  the  value 
given  by  (3,32).  Symbolic  manipulation  using  Maple  yields  the  expression 


■  i  ) 


(A-A2)2 


(A“  -  2Ai  A  -  A2c  + A]J)  A 


l  c 


(3.33) 
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which  is  positive  iff  Ai  >  A(1  +  s/c)  or  Ai  <  A(1  —  v/c).  Since,  by  definition,  A]  > 
A,  this  implies  that  the  equilibrium  (equivalently,  the  logarithmic  energy  minimizing) 
configuration  is  described  by  (3.32)  only  when  X\  >  A  ( 1  +  v^)- 

When  A  <  A]  <  A(1  +  ^/c),  (3.33)  is  negative  so  that  /|  given  by  (3*32)  is  a  (local) 
energy  maximizing  configuration*  The  n-th  particle,  which  starts  out  at  A|  >  A  is 
unable  to  escape  to  infinity  and  hence  minimizes  its  energy  by  “sliding  down”  towards 
the  origin*  However,  it  cannot  get  arbitrarily  close  to  the  origin  because  the  equilib¬ 
rium  configuration  of  the  2-nd  particle,  as  implied  by  Proposition  3*31  will,  with  high 
probability,  be  in  a  small  neighborhood  about  A(1  +  yfc)2.  Hence,  for  large  u.  when 
A  <  A]  <  A(1  +  v/c),  the  equilibrium  configuration  of  the  n-th  particle  will  also  be  in 
a  small  neighborhood  of  A(1  +  \/c)2.  Thus  a  phase  transition  phenmnenofi  occurs  so 
that  (asymptotically)  the  largest  eigenvalue  of  the  SCM  is  distinct  from  A{1  +  v/c)2 
only  when  the  signal  eigenvalues  are  greater  than  a  certain  threshold.  This  result  is 
stated  next  in  a  more  general  setting,  including  the  case  when  there  are  multiple  signals, 
thereby  lending  credibility  to  the  heuristic  approximations  and  arguments  we  employed 
in  our  derivations* 

Proposition  3*41*  Let  R  -denote  a  sample  covariance  matrix  formed  from  an  n  x 
m  matrix  of  Gaussian  observations  whose  columns  are  independent  of  each  other  and 
identically  distributed  with  mean  0  and  covariance  R  Denote  the  eigenvalues  of  R  by 

A]  >  A2  >  . ...  >  A*.  >  A^.+  ]  — _ An  —  A*  Let  lj  denote  the  j-th  largest  eigenvalue  of 

R*  Then  as  n,m  — *  00  with  cm  —  n/m  — *  c  €  (6,00), 


A(I  +  \/c)“  if  Aj  <  A(1  +  \/c) 


where  the  convergence  is  almost  surely. 

PROOF*  This  result  appears  in  [12]  for  very  general  settings,  A  matrix  theoretic  proof 
for  when  c  <  1  for  the  real  case  may  be  found  in  [70]  while  a  determinantal  proof  for 
the  complex  case  may  be  found  in  [11]*  □ 


■  3.4.3  Gaussian  fluctuations  of  largest  ("signal")  eigenvalues 

Proposition  3,42*  Assume  that  R  and  R  satisfy  the  hypotheses  of  Proposition  3-4 1. 
If  Xj  >  A(1  +  \fc)  has  multiplicity  1  and  if  cm  —  n/m  — ►  c  as  n,  rn  — >  oc  then 


(3*35) 
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where  the  convergence  in  distribution  is  almost,  surely  and 


(3- 3(i) 


Proof.  A  matrix  theoretic  proof  for  when  c  <  1  for  the  real  case  may  be  found  in  [70] 
while  a  determ inantal  proof  for  the  complex  case  may  be  found  in  [II].  The  result  has 
been  strengthened  by  Baik  and  Silverstein  [81]  for  general  c  €  (0,  oo).  □ 


■  3.5  Estimating  the  number  of  signals 

The  key  idea  behind  the  proposed  estimators  can  be  succinctly  summarized:  When  k 
signals  are  present  and  assuming  k  n,  then  the  fluctuations  in  the  “noise"  eigenvalues 
are  not  affected  by  the  “signal”  eigenvalues.  Hence,  “deviations"  (on  the  1  /n2  scale) 
of  the  sample  moments  of  subsets  of  sample  eigenvalues  subject  to  a  criterion  that 
penalizes  overfitting  of  the  model  order  should  provide  a  good  estimate  of  the  number 
of  signals.  The  Akaike  Information  Criterion  (AIC)  is  applied  to  the  noise  eigenvalue 
fluctuations  to  obtain  the  relevant  estimator. 


■  3.5.1  Akaike's  Information  Criterion 

Given  N  observations  Y  =  [y ( 1 ) , . . .  y(N)\  and  a  family  of  models,  or  equivalently  a 
parameterized  family  of  probability  densities  /(Y|0)  indexed  by  the  parameter  vector 
0,  we  select  the  model  which  gives  the  minimum  AIC  [2]  defined  by 

AIC*  =  -2  log  /C Y  \8)  +  2k  (3.37) 

where  0  is  the  maximum  likelihood  estimate  of  6,  and  k  is  the  number  of  free  parameters 
in  8.  The  idea  behind  this  is  that  the  AIC,  given  by  (3.37),  is  an  unbiased  estimate 
of  the  mean  Kullback-Liebler  distance  between  the  modelled  density  /(Y[0)  and  the 
estimated  density  / (Y \6).  We  apply  Akaike’s  information  criteria  on  the  fluctuations 
of  the  “noise”  eigenvalues  to  detect  the  number  of  signals.  The  estimators  presented, 
in  effect,  treat  large  departures  (on  the  1  fr?  scale)  of  the  sample  moments  of  subsets 
of  sample  eigenvalues  as  reflect  ing  the  presence  of  a  signal. 


■  3.5.2  Unknown  noise  variance 

When  the  noise  variance  is  unknown,  the  parameter  vector  of  the  model,  denoted  by 
0^,  is  given  by 


0*.  =  [Ai,...,Aa.,(72]'. 


(3.38) 
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The  number  of  free  parameters  in  0;.  is  then  k  +  1.  The  maximum  likelihood  estimate 
of  the  noise  variance  is  given  by 


(3.39) 


where  l\  >  ...  >  ln  an1  the  eigenvalues  of  R.  We  regard  the  observation  y  as 

2 

Ik  \ 

y  = 


i(i)-  c*3"(t-'); 


(3.40) 


where  ft  =  1  (or  2)  when  the  snapshots  are  real  (or  complex).  The  fluctuations  of  the 
ji  -  k  smallest  {“noise”)  eigenvalues  do  not  depend  on  the  “signal”  eigenvalues.  The 
log-likelihood  function  is  given  by 


where 


-log/(y|0)  =  ^  +  bog2V 
2  ql  2 


o  4  7  i  /  n  2  „  n  \ 

q  '  (2 — £  +  5 - h2) 

ft  m  \  Tt\.  m  J 


3.41 ) 


(3.42) 


Substituting  this  into  (3.37}  followed  by  some  straightforward  algebraic  manipulations 
yields  the  criterion  listed  in  the  lower  panel  of  Table  3.1(a). 


■  3,53  Known  noise  variance 

When  the  noise  variance,  a2,  is  known  then  the  parameter  vector  of  the  model  is  given 
by 

=  [Ai . Aftfi  (3.43) 

The  number  of  free  parameters  in  8^  is  then  k,  We  regard  the  observation  vec  tor  y  as 


U  = 


Td=k+ lik/tr2)  -  n 

E;u+,(w^-n(i  +  £)-(|-i)^ 


(3-44) 


where  ft  —  1  (or  2)  when  the  snapshots  are  real  (or  complex).  The  fluctuations  of 
the  7i  —  k  smallest  (“noise”)  eigenvalues  do  not  depend  on  the  “signal”  eigenvalues. 
Following  the  procedure  described  earlier  to  obtain  log/(tjt|0)  and  substituting  that 
expression  into  (3.37)  yields  the  criterion  listed  in  the  upper  panel  of  Table  3.1(a)  with 
the  overfitting  penalty  of  2k  instead  of  2(k  4-  1)  as  listed.  Our  usage  of  2(A;  +  1)  instead 
is  motivated  by  aesthetic  considerations. 

Figure  3-5  shows  sample  realizations  of  the  score  function  illustrating  how  large 
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(a)  Complex  signals:  n  =  16,  m  =  32.  (b)  Complex  signals:  n  —  32,  m  —  (M. 

Figure  3-5,  Sample  realizations  of  the  proposed  criterion  when  A|  =  10,  X2  —  3  and  A3  =  . . >  —  =  L 

departures  (on  the  1/n2  scale)  of  the  sample  moments  of  subsets  of  sample  eigenvalues 
when  appropriately  penalized  can  yield  an  accurate  estimate  of  the  number  of  signals 
present, 

■  3.6  Extensions  to  frequency  domain  and  vector  sensor  arrays 

When  the  m  snapshot  vectors  Xi(rt/j)  for  j  —  L - m  represent  Fourier  coefficients 

vectors  at  frequency  Wj  then  the  sample  covariance  matrix 

1  ™ 

R(  Wj )  =  —  ^2  Xi  ( Wj  )xi {Wj )'  (3.45) 

1  1=1 

is  the  periodogram  estimate  of  the  spectral  density  matrix  at  frequency  Wj.  The  time- 
domain  approach  carries  over  to  the  frequency  domain  so  that  the  estimators  in  Table 
li.l  remain  applicable  with  lj  =  l2(wj)  where  l\{w3)  >  h(Wj)  >  **  >  ln(wj)  are  the 
eigenvalues  of  R(iCj), 

When  the  signals  are  wideband  and  occupy  M  frequency  bins,  denoted  by  W\ . 
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then  the  information  on  the  number  of  signals  present  is  contained  in  all  the  bins.  'Flu1 
assumption  that  the  observation  time  is  much  larger  than  the  correlation  times  of  the 
signals  (sometimes  referred  to  as  the  SPLOT  assumption  -  stationary  process,  long 
observation  time)  ensures  that  the  Fourier  coefficients  corresponding  to  the  different 
frequencies  are  statistically  independent  . 

Thus  the  AIC  based  criterion  for  detecting  the  number  of  wideband  signals  that 
occupy  the  frequency  bands  ttq ,  ♦  * . ,  Wm  ,  as  given  in  Table  3.1(b).  is  obtained  by  sum¬ 
ming  the  corresponding  criterion  in  Table  3.1(a)  over  the  frequency  range  of  interest., 
Generically,  we  expect  to  use  0  =  2,  representing  the  usual  complex  frequency  do¬ 
main  representation,  for  the  wideband  frequency  domain  signal  estimators.  When  the 
number  of  snapshots  is  severely  constrained,  the  SPLOT  assumption  is  likely  to  be 
violated  so  that  the  Fourier  coefficients  corresponding  to  different  frequencies  will  not 
be  statistically  independent.  This  will  likely  degrade  the  performance  of  the  proposed 
estimators. 

When  the  measurement  vectors  represent  quaternion  valued  narrowband  signals, 
then  0  =  4  so  that  the  estimators  in  Table  3.1(a)  can  be  used.  Quaternion  valued  vectors 
arise  when  the  data  collected  from  vector  sensors  is  represented  using  quaternions  as 


3.6.  EXTENSIONS  TO  FREQUENCY  DOMAIN  AND  VECTOR  SENSOR  ARRAYS 


51 


Known  Noise  Variance 


t*  = 


nut, <'./»*)- » 


tk  =  ~  t'k 


£  2£(*+l) 

2m  (m  +  0  (2^ +  5^ +  2). 


h  = 


Unknown  Noise  Variance 


k)  _(HiL) 

(SL^iM*  ^  m) 


4  =  £ 


2  2—  (2-al  +  5-a  +  2) 

m  V  in*  p  in  f 


Number  of  Signals:  k  =  arg  minj,.eNfl<i<i„!n{m.T.)  4  +  ‘2{k  +  1) 


(a)  Time  domain  or  narrow  band  frequency  domain  signals:  0  =  1  (or  2)  when 
signals  are  real  (or  complex)- 


Known  Noise  Variance 


ta  - 


E!U+i  ('i4'j)/°,Vj))  -« 


0  , 
k  2  J 


m 

2iL  (  +  i)  "I 

m  In l  f 

2m  (£  +  0 

2™  f  5£4  2) 

Tti  y-  m7  r  1  j 

tj.Ar 


<j-fc  " 


{n-k)  E^|I,(U'J>'%(14  ”) 


Unknown  Noise  Variance 


{ElU.M*;)} 


a 


i7  ^  ^  ^  f  j  j^i 

'*  =  2-^,2  2*  (2^+S*  +  2) 


m  '  m*  rn 


Number  of  Signals:  k  -  argminfreN:t><t< . . n]  4  +  2M{H  1) 


(b)  Wideband  signals  occupying  Af  frequency  bins:  0  —  l  (or  2)  wtien  signals  arc 
real  (or  complex). 


Table  3,1,  Estimating  the  number  of  signals  from  the  eigenvalues  of  a  SCM  formed  from  m  snapshots. 
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■  3.7  Effective  number  of  signals  and  consistency  of  the  criteria 


Theorem  3,71,  Let  R  and  R  be  two  n  x  n  sized  covariance  matrices  whose  eigenvalues 
are  related  as 

A  =  diag( A \  $  -  -  - ,  A^,  *  *  *  *  i  A^-,  A, .  ■  ■ ,  A )  (3«46ft) 

A  —  diag(X] , . . . ,  Xpj  A, ...» A)  (3.46b) 

where  for  some  c  €  (0,  do),  and  all  i  =  p  +  1 _ ,  kf  X i  <  A  (1  +  v^)*  Let  R  and  R  be 

the  associated  sample  covariance  matrices  formed  from  rn  snapshots.  Then  for  every 
n,  rn(n)  —+  oc  such  that  em  =  nfm  — ►  t\ 

Prob(k  =  j  |  R)  —*  Prob(k  —  j  |  R)  for  j  =  1 1  — ,p  (3.47a) 

and 

Prob(k  >  p  |  R)  Prob(k  >  p  |  R)  (3.47b) 

where  the  convergence  is  almost  surety  and  k  is  the  estimate  of  the  number  of  signals 
using  obtained  the  algorithms  summarized  in  Table  3.1(a). 


PROOF.  The  theorem  follows  from  Proposition  3.41*  The  almost  sure  convergence  of 
the  sample  eigenvalues  lj  — ►  A(1  +  v^")2  f°r  3  —  P+  1* _ A*  implies  that  7-th  largest 

eigenvalues  of  R  and  R  converge  to  the  same  limit  almost  surely.  The  fluctuations 
about  this  limit  will  hence  be  identical  so  that  (3.47)  follows  in  the  asymptotic  limit,  □ 
Note  that  the  rate  of  convergence  to  the  asymptotic  limit  for  Prob(fc  >  j/|R)  and 
Prob(A'  >  p|R)  will,  in  general,  depend  on  the  eigenvalue  structure  and  may  be  arbi¬ 
trarily  slow.  Thus.  Theorem  3,71  yields  no  insight  into  rate  of  convergence  type  issues 
which  are  important  in  practice.  Rather,  the  theorem  is  a  statement  on  the  asymptotic 
equivalence,  from  an  identifiability  point  of  view,  of  sequences  of  sample  covariance 
covariances  which  are  related  in  the  manner  described.  At  this  point,  we  are  unable  to 
prove  the  consistency  of  the  proposed  estimators  as  this  would  require  more  a  refined 
analysis  that  characterizes  the  fluctuations  of  subsets  of  the  (ordered)  “noise”  eigenval¬ 
ues,  The  statement  regarding  consistency  of  the  proposed  estimator  is  presented  as  a 
conjecture  with  numerical  simulations  used  as  (non-definitive)  evidence. 

Conjecture  3.72.  Let  R  be  a  n  x  n  covariance  matrix  that  satisfies  the  hypothesis 
of  Proposition  3.4  L  Let  R  be  a  sample  covariance  matri:r  formed  from  m  snapshots. 
Define 

kvtf(c  |  R)  —  Number  of  eigenvalues  of  R  >  A(1  +  yf<)-  (3.48) 

Then  in  m,n  oo  limit  with  cm  —  n/rn  — +  c,  k  is  a  consistent  estimator  of  (r  |  R  ) 
where  k  is  the  estimate  of  the  number  of  signals  obtained  using  the  algorithms  summa¬ 
rized  in  Table  8.1(a). 
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■  3.7.1  The  asymptotic  identifiability  of  two  closely  spaced  signals 

Suppose  there  are  two  uncorrelated  Gaussian  {hence,  independent)  signals  whose  co¬ 
variance  matrix  is  the  diagonal  matrix  R,  —  diag{<r|p  <7§2).  In  (3.1)  and  (3.3),  let 
A  -  [vjvaj.  In  a  sensor  array  processing  application,  we  think  of  V|  —  v(#i)  and 
vy  =  v^(£/2)  encoding  the  array  manifold  vectors  for  a  source  and  an  interferer  with 
powers  <7g|  and  a|2i  located  at  and  62 ,  respectively.  The  covariance  matrix  given  by 

R  =  erg]  v  i  Vj  +  crg2V2V2  +  (t21  (3.49) 

has  the  n  -  2  smallest  eigenvalues  A3  =  . . .  =  A„  =  a1  and  the  two  largest  eigenvalues 


2  t  (CTsi  l!vi  II2  +^S2  II  v2 


X  I  =  {T"  + 


2  ,  (CTS1  II  V1  U“  +°S2  llV2j 


)  \A^S1  II  V  ill2  -^SS  II  v2  II2)  J  +  ^]Cr|2|(vhv-2)|^ 


(3.50a) 


2)  y/{trh  ilyi  II2  ~°S2  llv2ll2)i  +  4ffgiq|2|(viiV2)|2 


A2  =  <t+.  2  2 

(3.50b) 

respectively.  Applying  the  result  in  Proposition  3.41  allows  us  to  express  the  effective 
number  of  signals  as 


2 


A*(,|t  —  <  I 


0 


if 


if 


if 


172  0 +  \/I) <  A2 
<  1,2  (* +  \/I) 
Ai  s  °2  (‘ + \f£) 


<  A, 


(3.51) 


In  the  special  situation  when  ||  vi  [|  =  ||  vj  |j=||  v  ||  and  Ogj  =  <7£2  =  a\,  we  can  (in  an 
asymptotic  sense)  reliably  detect  the  presence  of  both  signals  from  the  sample  eigenval¬ 
ues  alone  whenever 


(3.52) 

Equation  (3.52)  captures  the  tradeoff  between  the  identifiability  of  two  closely  spaced 
signals,  the  dimensionality  of  the  system,  the  number  of  available  snapshots  and  the 
cosine  of  the  angle  between  the  vectors  V|  and  v2.  It  may  prove  to  be  a  useful  heuristic 
for  experimental  design. 
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■  3.8  Numerical  simulations 

■  3.8.1  Test  of  sphericity 

When  R.  =  a1!,  so  that  k  =  0,  we  evaluate  the  performance  of  the  proposed  estimators 
by  examining  the  probability  that  k  =  0  over  20, 000  Monte-Carlo  trials.  Figure  3-7 
compares  the  empirical  results  as  a  function  of  the  number  of  snapshots,  for  different 
values  of  n .  The  Wax-Kailath  estimators  always  over-estimate  the  number  of  signals 
for  the  sample  sizes  considered.  In  contrast,  the  proposed  algorithms  correctly  predict 
the  number  of  signals  more  than  90%  of  the  time  even  when  the  dimensionality  of 
the  system  is  as  small  as  8*  The  simulations  suggest  large  sample  consistency  of  the 
estimators,  with  the  complex  signal  case  exhibiting  a  faster  rate  of  convergence  than 
the  real  signal  case,  as  expected  from  Proposition  3,32.  The  characteristic  [/-shape  of 
the  performance  curve  appears  because  the  noise  eigenvalue  fluctuations  of  the  SCM 
with  n  —  32,  m  —  16  are  identical  to  that  of  an  SCM  with  n  =  16,  m  =  32, 

The  superior  detection  performance  of  the  estimator  in  the  unknown  noise  variance 
scenario  when  R.  —  a2I  comes  as  no  surprise  since  its  criterion  involves  comparing  the 
fluctuation  of  just  a  single  moment  of  the  SCM,  Given  the  inherent  symmetries  of  the 
null  hypothesis,  the  degradation  in  the  performance  of  the  estimator  when  the  noise 
variance  is  unknown  and  must  be  estimated  will  only  be  revealed  in  tests  involving  the 
detection  of  fc  >  0  signals, 

■  3,8.2  Illustration  of  effective  number  of  signals  concept 

Consider  the  detection  problem  on  a  covariance  matrix  R  with  n  —  2  eigenvalues  of 
magnitude  a2  =  1,  X\  =  10  and  A-j  —  3.  Figure  3-8  compares  the  empirical  probability 
(over  20, 000  Monte-Carlo  trials)  of  detecting  2  signals  for  the  proposed  estimators  for 
a  range  of  values  of  n  and  m.  The  empirical  probability  of  Wax-Kailath  estimators 
detecting  2  signals  over  these  trials  is  identically  zero ,  Note  how  the  complex  valued 
case  performs  better  than  the  real  valued  case  for  the  same  (n,m)  pair.  This  rate-of- 
convergence  type  effect  is  expected  given  the  behavior  of  the  associated  fluctuations 
in  Proposition  3.32.  The  simulation  suggest  that  the  estimators  exhibit  large  sample 
consistency  with  a  faster  rate  of  convergence  for  complex  signals  than  real  signals. 
Figure  3-9  compares  the  performance  of  the  two  estimators;  the  case  where  the  noise 
variance  is  known  performs  better,  as  expected. 

Parsing  the  empirical  data  different  ly  allows  us  to  illustrate  the  relevance  of  effective 
number  of  signals  concept  in  light  of  the  discussion  in  Section  3.7.  For  the  covariance 
matrix  considered,  when  n  =  4m,  cm  =  4  so  that  the  effective  number  of  signals, 
from  (3.48),  equals  1.  Figure  3-10  compares  the  empirical  probability  of  detecting  zero 
and  one  signals,  when  the  signals  art'  real  and  complex,  for  different  values  of  n  with 
m  —  nj 4.  The  simulations  illustrate  the  consistency  of  the  proposed  estimators  in  the 
n,m(n)  — >  qo  limit,  with  respect  to  the  effective  number  of  signals,  as  conjectured. 

The  rate  of  convergence  to  the  asymptotic  result  is  faster  for  the  complex  signals 
case,  as  before.  The  probability  of  detecting  zero  signals  decays  to  zero  as  the  probabil- 
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(b)  n  and  »n  not  “large  enough.” 


Figure  3-6.  “Signal"  and  “noise”  eigenvalue  fluctuations  induced  limits  on  signal  idcnt  inability. 


it.y  of  detecting  one  signal  approaches  zero.  This  highlights  the  relevance  of  the  effec¬ 
tive  number  of  signals  concept  in  high-dimensional  settings.  In  moderate  dimensional 
settings,  the  fluctuations  of  the  signal  eigenvalues,  when  combined  with  the  concept 
effective  rank,  best  capture  the  inherent  difficulties  in  tile  formulation  of  the  detection 
problem. 

For  the  example  considered,  when  the  signal  is  complex,  the  largest  (and  only)  signal 
eigenvalue  fluctuates  about.  10(1  —  4/9)  ~  14.4  with  a  variance,  given  by  Proposition 
3.42,  approximately  equal  to  10“ ( 1  —  4/9 2)/«  ss  95/n.  The  largest  noise  eigenvalue 
fluctuations  about  (1  -I-  \/(4))2)  =  9.  Reliable  detection  of  the  effective  number  of 
signals  occurs  in  Figure  3-10  for  values  of  «  large  enough  that,  the  separation  between 
the  signal  eigenvalue  and  the  largest,  noise  eigenvalue  is  roughly  (5  —  7  times  the  variance 
of  the  signal  eigenvalue  fluctuation  (as  in  Figure  3-6(a)).  For  values  of  n  smaller  than 
that,  the  signal  eigenvalue  is  insufficiently  separated  from  the  noise  eigenvalue  to  be 
identified  as  such  (as  in  Figure  3-6(b)).  In  this  context,  moderate  dimensionality  is  a 
greater  curse  than  high-dimensionality  because  the  fluctuations  of  the  signal  and  noise 
eigenvalues  make  the  signal  versus  noise  decidability  issue  even  more  challenging. 


■  3.9  Future  work 

We  have  developed  an  approach  for  detecting  the  number  of  signals  in  white  noise  from 
the  sample  eigenvalues  alone.  The  proposed  estimators  explicitly  take  into  account  the 
blurring  of  the  sample  eigenvalues  due  to  the  finite  size  and  are  hence  able  to  outperform 
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traditional  estimators  that  do  not  exploit  this  information. 

In  principle,  we  could  have  formulated  our  algorithms  to  consider  the  fluctuations 
of  the  “signal”  eigenvalues  as  well  instead  of  focussing  on  the  fluctuations  of  the  “noise11 
eigenvalues  alone.  Such  an  algorithm  would  be  computationally  more  complex  because 
we  would  have  to  first  obtain  the  maximum  likelihood  estimate  of  the  signal  eigenvalue 
using  the  results  in  Proposition  3,42.  Since  the  distribution  of  the  signal  eigenvalue 
depends  on  its  multiplicity,  formulating  the  problem  in  terms  of  a  joint  signal  and  noise? 
eigenvalue  estimation-detection  framework  implies  that  the  practitioner  would  be  forced 
to  make  subjective  assumptions  on  the  multiplicity  of  the  individual  signal  eigenvalues. 
This  makes  such  a  formulation  less  desirable  than  the  noise  eigenvalue  only  solution 
proposed. 

Future  work  would  he  to  analytically  prove  the  conjecture  stated  regarding  the  con¬ 
sistency  of  the  algorithms.  It  would  be  of  value  to  compare  the  present  AIC  based 
formulation  to  the  MDL/RIC  based  formulation  for  the  proposed  algorithms.  It  re¬ 
mains  an  open  question  to  analyze  and  design  such  eigenvalue  based  signal  detection 
algorithms  in  the  Neyinan-Pearson  sense,  z.e.,  finding  the  most  powerful  test  that  does 
not  exceed  a  threshold  probability  of  false  detection.  Finer  properties,  perhaps  buried  in 
the  rate  of  convergence  to  the  asymptotic  results  used,  might  be  useful  in  this  context. 
Such  estimators  will  require  practitioners  to  set  thresholds.  Though  this  is  something 
we  instinctively  shy  away  from,  if  the  performance  can  be  significantly  improved  then 
this  might  be  a  price  we  might  be  ready  to  pay,  especially  for  the  detection  of  low- level 
signals  right  around  the  threshold  where  the  phase  transition  phenomenon  kicks  in. 
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(a)  Real  signals:  known  noise  variance.  (b)  Complex  signals:  known  noise 

variance. 


(c)  Real  signals:  unknown  noise  vari-  (d)  Complex  signals:  unknown  noise 
ancc.  variance. 

Figure  3-7.  Performance  of  proposed  estimators  when  there  are  zero  signals  in  white  noise.  Note  that 
10“°  01  %  0.9772  while  HT0  04  ^  0.9120. 
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(a)  Real  signals:  know  a  noise  vari¬ 
ance. 


(b)  Complex  signals:  known  noise 
variance. 


(c)  Real  signals:  unknown  noise  vari-  (d)  Complex  signals:  unknown  noise 


Figure  3-8.  Performance  of  proposed  estimators  when  there  are  2  signals  in  white  noise. 


ID  C  o/*®ct 
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S’ 


(a)  Real  signals. 


Iagl(j  #  Snapshots 


— 


Ioglo  #  Snapshots 


(b)  Complex  signals 


Figure  3-9.  Comparison  of  the  known  vs,  unknown  estimators. 
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(a)  Real  signals:  I  signal  hypothesis. 


(b)  Heal  signals:  0  signals  hypol  lies  is. 


iog10  Dimension 


Ioq1c  Dimension 


(o)  Complex  signals:  I.  signal  hypot.he*-  (d)  Complex  signals:  0  signals  hypolh- 

sis,  esis. 

Figure  3-10.  Comparison  of  the  known  vs.  unknown  estimators. 


Chapter  4 


Statistical  eigen-inference:  Large 

Wishart  matrices 


hi  this  chapter  we  expand  the  inference  methodologies  developed  in  Chapter  3  to  a 
broader  class  of  covariance  matrices.  "Fills  chapter  is  organized  as  follows.  We  moti¬ 
vate  the  problem  in  Section  4.1  and  preview  the  structure  of  the  proposed  algorithms 
summarized  in  Table  4.1.  In  Section  4,2  we  introduce  the  necessary  definitions  and 
summarize  the  relevant  random  matrix  theorems  that  we  exploit.  Concrete  algorithms 
for  computing  the  analytic  expectations  that  appear  in  the  algorithms  (summarized  in 
Table  4,1)  are  presented  in  Section  4.3.  The  eigen-inference  techniques  are  developed 
in  Section  4.4.  The  performance  of  the  algorithms  is  illustrated  using  Monte-Carlo 
simulations  in  Section  4.5,  Some  concluding  remarks  are  presented  in  Section  4.7. 


■  4.1  Problem  formulation 

Lot  X  =  [xi . xm]  be  a  n  x  m  data  matrix  where  xi, ...  ,xm,  denote  m  independent 

measurements,  where  for  each  t,  x,  has  an  n-dimensional  {real  or  complex)  Gaussian 
distribution  with  mean  zero,  and  positive  definite  covariance  matrix  S.  The  sample 
covariance  matrix  (SCM)  when  formed  from  these  m  samples  as 


S  := 


1 

m 


m 


X>x>' 


-XX', 

m 


(4.1) 


lias  the  (central)  Wishart  distribution  [114].  We  focus  on  inference  problems  for  pa¬ 
rameterized  covariance  matrices  modelled  as  £#  —  \JA()U'  where 


Ae  = 


(l\ I711 

U’jlna 


(4,2) 


where  0.1  >  ...  >  o*.  and  =  i  nj  =  n-  Defining  t,  =  allows  us  to  conveniently 

express  the  2k  -  1  dimensional  parameter  vector  as  0  =  (fi,...  — a^)  with 
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the  obvious  non- negativity  constraints  on  the  elements. 

Models  of  the  form  in  (4.2)  arise  whenever  the  measurements  are  of  the  form 

Xj  =  A  Si  +  z7  For  i  =  1  s .  • . ,  iTi  (4.3) 

where  z ;  —  jVn(0,a2I),  denotes  an  rc-dimensional  (real  or  complex)  Gaussian  noise 
vector  where  a2  is  generic  ally  unknown,  s*  —  .A4(0, 1),  s7  ^  A4(0*  Rs)  denotes  a  k- 
dimensional  (real  or  complex)  Gaussian  signal  vector  with  covariance  R5l  and  A  is  a 
n  x  k  unknown  non-random  matrix.  In  array  processing  applications,  the  j-\h  column 
of  the  matrix  A  encodes  the  parameter  vector  associated  with  the  j- th  signal  whose' 
magnitude  is  described  by  the  j-the  element  of  s7. 

Since  the  signal  and  noise  vectors  are  independent  of  each  other,  the  covariance 
matrix  of  x3  can  hence  be  decomposed  as 

E  =  *  +  Ec  (4.4) 

where  E^  is  the  covariance  of  z  and  4*  =  AS(,A'  with  1  denoting  the  conjugate  trans¬ 
pose.  One  way  of  obtaining  E  with  eigenvalues  of  the  form  in  (4,2)  was  described  in 
Chapter  3.  When  Ec  =  a2  I  so  that  the  n  —  k  smallest,  eigenvalues  of  E  are  equal  to 
a2.  Then,  if  the  matrix  A  is  of  full  column  rank  so  and  the  covariance  matrix  of  the 
signals  E^  is  nonsingular,  the  n  —  k  smallest  eigenvalues  of  4/  are  equal  to  zero  so  that 
the  eigenvalues  of  E  will  be  of  the  form  in  (4/2),  Alternately,  if  the*  eigenvalues  of  4* 
and  E;  have  the  identical  subspace  structure,  i.e.,  in  (4.2),  tf  =  f^'  for  all  /,  then 
whenever  the  eigenvectors  associated  with  each  of  the  subspaces  of  4*  and  E*  align,  the 
eigenvalues  of  E  will  have  the  subspace  structure  in  (4.2). 


■  4.1.1  Inferring  the  population  eigenvalues  from  the  sample  eigenvalues 

While  inference  problems  for  these  models  have  been  documented  in  texts  such  as  [67] > 
the  inadequacies  of  classical  algorithms  in  high-dimensional,  (relatively)  small  sample1 
size  settings  have  not  been  adequately  addressed.  We  highlight  some  of  the  prevalent 
issues  in  the  context  of  statistical  inference  and  hypothesis  testing. 

Anderson's  landmark  paper  [6]  develops  the  theory  that  describes  the1  (large  sample) 
asymptotics  of  the  sample  eigenvalues  (in  the  real  valued  case)  for  such  models  when  the 
true  covariance  matrix  lias  eigenvalues  of  arbitrary  multiplicity.  Indeed,  for  arbitrary 

covariance  R,  the  joint  density  function  of  the  eigenvalues  /  j . ln  of  the  SCM  S  when 

m  >  n  +  1  is  shown  to  be  given  by 

Zim  iz  if(m_n+,)/2_1  n  \li  -Ijf  f  exp  (~Tr  (E-*QSQ'))  dQ  (4.5) 

1=1  i<j  ^  ■ 

where  l\  >  . , .  >  ln  >  0.  ZnJn  is  a  normalization  constant,  and  0  =  1  (or  2)  when  S 
is  real  (reap.  complex).  In  (4,5),  Q  €  O(n)  when  0—1  while  Q  €  U(n)  when  0  —  2 
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where  O(n)  and  U(n)  are,  respectively,  the  set  of  nx  n  orthogonal  and  unitary  matrices 
with  Haar  measure.  Anderson  notes  that 

If  the  characteristic  roots  of  £  are  different,  the  deviations  , , ,  from  the  cor¬ 
responding  population  quantities  are  asymptotically  normally  distributed. 

When  some  of  the  roots  of  £  are  equal,  the  asymptotic  distribution  cannot 
be  described  so  simply. 

Indeed,  the  difficulty  alluded  to,  arises  due  to  the  presence  of  the  integral  over  orthogonal 
(or  unitary)  group  on  the  right  hand  side  of  (4.5).  This  problem  is  compounded  in 
situations  when  some  of  the  eigenvalues  of  £  are  equal  as  is  the  case  for  the  model 
considered  in  (4,2),  Nonetheless,  Anderson  is  able  to  use  the  (large  sample)  asymptotics 
to  derive  the  maximum  likelihood  estimate  of  the  population  eigenvalues,  a;,  as 


for  /  = 


(4.6) 


where  A 3  are  the  sample  eigenvalues  (arranged  in  descending  order)  and  N[  is  the  set 
of  integers  n\  -f  . .  -  rq_i  +  1,  *  *  * ,  n\  +  . . .  +  This  is  a  reasonable  estimator  that 
works  well  m  practice  when  m  n.  The  large  sample  size  asymptotics  are,  however,  of 
limited  utility  because  they  ignore  the  (significant)  effect  of  the  dimensionality  of  the 
system  on  the  behavior  of  the  sample  eigenvalues. 

Consequently,  (large  sample  size)  asymptotic  predictions,  derived  under  the  n  fixed* 
m  — +  DC'  regime  do  not  account  for  the  additional  complexities  that  arise  in  situations 
where  the  sample  size  m  is  large  but  the  dimensionality  n  is  of  comparable  order.  Fur¬ 
thermore,  the  estimators  developed  using  the  classical  large  sample  asymptotics  invari¬ 
ably  become  degenerate  whenever  n  <  m,  so  that  n  —  m  of  the  sample  eigenvalues  will 
identically  equal  to  zero.  For  example,  when  rn  —  nf  2,  and  there  are  two  distinct  popu¬ 
lation  eigenvalues  each  with  multiplicity  n/2  then  the  estimate  of  the  smallest  eigenvalue 
using  (4.6)  will  be  zero.  Other  such  scenarios  where  the  population  eigenvalue  estimates 
obtained  using  (4.6)  are  meaningless  are  easy  to  construct  and  are  practically  relevant 
in  many  applications  such  as  radar  and  sonar  signal  processing  [90,102],  and  many  more. 

There  are.  of  course,  other  strategies  one  may  employ  for  inferring  the  population 
eigenvalues.  One  might  consider  a  maximum-likelihood  technique  based  on  maximizing 
the  log- likelihood  function  of  the  observed  data  Ar  which  is  given  by  (ignoring  constants) 


f(X|E)  :=  -m(trSE_1  +  logdetE), 
or,  equivalently,  when  E  —  UAU',  by  minimizing  tiie  objective  function 

/i(X|U,  A)  =  (trSUA_1U'  +  logdet  A). 


(4.7) 


What  should  be  apparent  on  inspecting  (4.7)  is  that  the  maximum-likelihood  esti- 
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mation  of  the  parameters  of  A  of  the  form  in  (4.2)  requires  us  to  model  the  population 
eigenvectors  U  as  well  (except  when  k  —  1).  If  U  were  known  apriori,  then  an  estimate 
of  ai  obtained  as 

St  «  —  £ (U'St%j  for  1=  (4.8) 

T>‘  j<ZNt 

Ni  is  the  set  of  integers  n\  + . . .  +  ftf_t  + 1,  *  * . ,  n\  +  * .  *  +  n;  will  provide  a  good  estimate. 
In  practical  applications,  the  population  eigenvectors  might  either  be  unknown  or  be 
misspecified  leading  to  faulty  inference.  Hence  it  is  important  to  have  the  ability  to 
perform  statistically  sound,  computationally  feasible  eigen-inference  of  the  population 
eigenvalues,  ie.,  from  the  sample  eigenvalues  alone,  in  a  manner  that  is  robust  to 
high-dimensionality  and  sample  size  constraints. 

We  illustrate  the  difficulties  encountered  in  high-dimensional  settings  with  an  exam¬ 
ple  (summarized  in  Figure  4-1)  of  a  SCM  constructed  from  a  covariance  matrix  modelled 
as  K  =  UAU'  with  n  —  100  and  sample  size  m  —  300,  Half  of  the  eigenvalues  of  A  arc 
of  magnitude  3  while  the  remainder  are  of  magnitude  1.  The  sample  eigenvalues  arc 
significantly  blurred,  relative  to  the  true  eigenvalues  as  shown  in  Figure  4-1  (a).  Figures 
4- 1(b),  and  4-1  (d)  plot  the  sample  eigenvectors  for  the  case  when  the4  true  eigenvectors 
U  —  I,  and  an  arbitrary  U.  respectively.  Figures  4-1  (c)  and  4-1  (e)  plot  the  diagonal 
elements  (S)x?,  Thus,  if  the  true  eigenvector  was  indeed  U  =  I  then  an  estimate  of  the 
population  eigenvalues  formed  as  in  (4.8)  yields  a  good  estimate;  when  U  /  L  however, 
the  estimate  is  very  poor. 


■  4,1,2  Testing  for  equality  of  population  eigenvalues 

Similar  difficulties  are  encountered  in  problems  of  testing  as  well.  In  such  situations. 
Anderson  proposes  the  likelihood  ratio  criterion  for  testing  the  hypothesis 


given  by 

n  v«’  e  *>)"* 

j€JV, 

where  Xj  are  the  sample  eigenvalues  (arranged  in  descending  order)  and  N[  is  the  set 
of  integers  n\  +- . .  *  +  nj_]  +  1, . . . ,  n\  +  . . .  +  n*.  The  test  in  (4,9)  suffers  from  the  same 
deficiency  as  the  population  eigenvalue  estimator  in  (4.6)  -  it  becomes  degenerate  when 
n  >  m .  When  the  population  eigenvectors  U  are  known,  (4.9)  may  be  modified  by 
forming  the  criterion 


for  l  —  1, . . , ,  fe* 


(4.9) 


I  I  (U'SUJ^/tn*1  ^  (U'SUJjj)'1* 


jeiv, 


jew, 


k 


for  /  =  1, 


(4-10) 
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where  Ni  is  the  set  of  integers  m  + . . .  +  n<_  ( + 1, . , , ,  m  + . ,  .+B[.  When  the  eigenvectors 
are  tnisspccified  the  inference  provided  will  be  faulty.  For  the  earlier  example,  Figure  4- 
1(e)  illustrates  this  for  the  case  when  it  is  assumed  that  the  population  eigenvectors  are 
1  when  they  are  really  U  ^  I.  Testing  the  hypothesis  £  =  So  ,  reduces  to  testing  the 
mill  hypothesis  X  =  I  when  the  transformation  x*  =  E0  f  x,  is  applied.  The  robustness 
of  tests  for  sphericity  in  high  dimensional  settings  has  been  extensively  discussed  in  [55], 

■  4,1,3  Proposed  statistical  eigen  inference  techniques 

In  this  chapter  our  focus  is  on  developing  population  eigenvalue  estimation  and  testing 
algorithms  for  models  of  the  form  in  (4.2)  that  are  robust  to  high-dimensionality,  sam¬ 
ple  size  constraints  and  population  eigenvector  misspecificat  ion.  We  are  able  to  develop 
such  computationally  feasible  algorithms  by  exploiting  the  properties  of  the  eigenvalues 
of  large  Wishart  matrices.  These  results,  analytically  describe  the  non-random  blurring 
of  the  sample  eigenvalues,  relative  to  the  population  eigenvalues,  in  the  n,  m(n)  — *  oo 
limit,  while  compensating  for  the  random  fluctuations  about  the  limiting  behavior  due 
to  finite  dimensionality  effects.  This  allows  us  to  handle  the  situation  where  the  sample 
eigenvalues  are  blurred  to  the  point  that  the  block  subspace  structure  of  the  population 
eigenvalues  cannot  be  visually  discerned,  as  in  Figure  4-1  (a)T  thereby  extending  the 
“signal”  detection  capability  beyond  the  special  cases  tackled  in  [88],  The  nature  of  the 
mathematics  being  exploited  makes  them  robust  to  the  high-dimensionality  and  sample 
size  constraints  while  the  reliance  on  the  sample  eigenvalues  alone  makes  them  insen¬ 
sitive  to  any  assumptions  on  the  population  eigenvectors.  In  such  situations  where  the 
eigenvectors  are  accurately  modelled,  the  practitioner  may  use  the  proposed  method¬ 
ologies  to  complement  and  “robustify*  the  inference  provided  by  estimation  and  testing 
methodologies  that  exploit  the  eigenvector  structure. 

We  consider  testing  the  hypothesis  for  the  equality  of  the  population  eigenvalues 
and  statistical  inference  about  the  population  eigenvalues.  In  other  words,  for  some 
unknown  U,  if  Eq  —  UA^Uy  where  A#  is  modelled  as  in  (4,2),  techniques  to  I)  test  if 
S  —  Sfj,  and  2)  estimate  0u  are  summarized  in  Table  4.T  We  note  that,  inference  on  the 
population  eigenvalues  is  performed  using  the  entire  sample  eigen- spectrum  unlike  (4.6) 
and  (4,9),  This  reflects  the  inherent  non-linearities  of  the  sample  eigenvalue  blurring 
induced  by  high-dimensionality  and  sample  size  constraints.  An  important  implication 
of  t  his  in  practice  is  that  in  high  dimensional,  sample  size  starved  settings,  inference 
performed  on  a  subset  of  sample  eigenvalues  alone  is  likely  to  be  inaccurate,  or  worse 
misleading.  In  such  settings,  practitioners  are  advised  to  consider  tests  (such  as  the  ones 
proposed)  for  the  equality  of  the  entire  population  eigen-spectrum  instead  of  testing  for 
the  equality  of  individual  population  eigenvalues. 
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D  True  eigenvalues 
O  Sample  eigenvalue* 


(a)  Sample  eigenvalues  versus  true  eigenvalues  (n  —  100,  m  =  300)* 


(b)  Sample  eigenvectors  when  U  —  I. 


(c)  Diagonal  elements  of  S  when 
U  =  I 


(d)  Sample  eigenvectors  for  arbitrary  U. 


(e)  Diagonal  elements  of  S  for  ar¬ 
bitrary  U. 


Figure  4-1.  'Hie  challenge  of  estimating  the  population  eigenvalues  from  the  sample  eigenvalues  in 
I  ligtwli n  icnsional  set  t  i  ngs. 
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Testing:  He„  :  h(8)  :=  Vg  Qe  1  v0  ~  xi  ^  7. 


q  —  dim(vg)  —  2 


Estimation:  6  —  arg min  { ve  Qo 1  v0  +  Jogdet  Q0 } ,  q  =  dim{\e)  >  dim{&) 
ee© 


Legend: 


{V0).,  =  nx  (  -7rSj  -  E  -Tr&  )  . 

J  \  71  l71  J  / 

Qfl  =  COV  [v0v0] 


j  =  ll . 


Table  4.1.  Structure  of  proposed  algorithms. 


■  4.2  Preliminaries 

Definition  4*21  ■  Lei  A  =  (A,v)a:€^  be  an  N  x  N  mains  with  real  eigenvalues .  77m 
jMA  sample  moment  Ls  denned  as 

tr(AJ)  :=  -^Tr  (AJ  )■ 

where  Tr  is  £Ae  im-normaiized  trace. 

Definition  4,22.  Let  A  =  (A  v ) Ar^N  be  a  sequence  o/  set/-adjoint  TV  x  N -random 
matrices.  If  the  limit  of  all  moments  defined  as 

af  =:  Jim^EMA^}]  (N  €  N) 

exists  then  we  say  that  A  has  a  limit  eigenvalue  distribution. 

Notation  4.23.  For  a  random  matrix  A  with  a  limit  eigenvalue  distribution  we  denote 
by  Ma{x)  the  moment  power  seriesf  which  we  define  by 

Ma(x)  :=  1  +  ^  af  xF 
i>i 


Notation  4.24.  For  a  random  matrix  ensemble  A  with  limit  eigenvalue  distribution  we 
denote  by  9a{x)  the  corresponding  Cauchy- transform,  which  we  define  as  formal  power 
series  by 


! Ia(x)  ■=  Jim  E[-J-Tr  (.tIjV  -  A,v)  ']  =  -MA(l/x). 

o  iV  .1' 


Definition  4.25.  Let  A  —  (A n)n€N  be  a  self-adjoint  random  matrix  ensemble.  We 
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say  that  it  has  a  second  order  limit  distribution  if  for  all  /,  j  E  N  the  limits 

af  :=  Jim  C|(tr(A^)) 

J  h— >oo 

and 

<•:=  Jim  C2(Tr(A|v).Tr(AJv}) 

J  A'— oo 

exist  and  if 

Jim  cr(Tr(A]',)) . Tr(A^r>))  =0 

Pt—*OQ 

for  all  j  >  3  and  all  j(l),  *  *  *  s  j(r)  6  N.  In  this  definition,  we  denote  the  (classical) 
curnulants  by  cJ7l .  Note  that  C\  is  just  the  expectation,  and  c?  the  covariance. 


Notation  4.26,  When  A  —  (Ay  )/vcn  has  a  limit  eigenvalue  distribution,  then  the 
limits  a*  limJv-.0C.E[tr{Ajv)]  exist  When  As  has  a  second  order  limit  distribution , 
the  fluctuation 

tr(A^)  -  af 

is  asymptotically  Gaussian  of  order  1/j V,  We  consider  the  second  order  covariances 
defined  as 

of  :=  liaii  cov(Tr(A^)1Tr{AJv))1 
/V— oc 

and  denote  by  y)  the  second  order  moment  power  series,  which  we  define  by: 


MA{r.y)  :=  ^  afjx'y3 . 
i  J>  l 


Theorem  4.27.  Assume  that  the  nx  n  ( non-random )  covariance  matrix  £  =  (En)ne^ 
has  a  limit  eigenvalue  distribution.  Let  S  be  the  (real  or  complex)  sample  covariance 
matrix  formed  from  the  m  samples  as  in  (4*1  )•  Then  for  n,  nt  — ►  oo  with  n/m  —+  r  € 
((too),  S  has  both  a  limit  eigenvalue  distribution  and  a  second  order  limit  distribution. 
The  Cauchy  transform  of  the  limit  eigenvalue  distribution  g(x)  =  <js (x*)>  satisfies  the 
equation: 


.<?(*) 


i 

1  —  c  +  cxg{x) 


x 

c  +  cxg{x) 


)• 


(4.11) 


until  the  corresponding  power  series  Mg(x)  —  l/xgs{l/x).  Define  S  =  ~ X'X  .so  that 
its  moment  power  series  is  given  by 


M§(y)  =  c(Ms(z)-  1)  +  1. 

The  second  order  moment  generating  series  is  given  by 


Ms(x,y)  -  M§(x.y)  =  -Mf(x,y) 


(4.12) 


(4.13  a) 
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when' 


Mg(x,y)  -  xy 


\  (iMj(i)  - 


(a:  -  y)2  J 


where  0  equals  I  (or  2)  when  the  elements  of  S  are  real  (or  complex). 
Proof.  See  Appendix  A,  □ 


(4.13b) 


■  4.3  Computational  aspects 

Proposition  4.31.  For  £<?  =  UApU'  as  in  (4-2),  let  6  =  (fj, . . . ,  f*_i,  aj, . . . ,  «*) 
where  ft  =  njn.  Then  S  has  a  limit  eigenvalue  distribution  as  well  as  a  second  aider 
limit  distribution.  The  moments  ap  and  hence  ctfj,  depend  on  0  and  c.  Let  v#  be  a 
q-by- 1  vector  whose  j-the  element  is  given  by 

(ve)j-  =  Tr SJ  -  noj. 


Then  for  lunje  n  and  m , 


v0  ^  A/'t/ifl.Qfl) 

where  fig  =  0  if  S  is  complex  and  (Qe)i,j  =  ctfj- 


(4.14) 


PROOF.  This  follows  directly  from  Theorem  4.27.  From  (4.15)  and  (4.17),  the  moments 
o£  depend  on  n“  and  c  —  n/m  and  hence  on  the  unknown  parameter  vector  6.  The 
existence  of  the  non-zero  mean  when  S  is  real  follows  front  the  statement  in  [10].  □ 


■  4.3.1  Computation  of  moments  of  limiting  eigenvalue  distribution 

Equation  (4.11)  expresses  the  relationship  between  the  moment  power  series  of  £  and 
that  of  S  via  the  limit  of  the  ratio  n/m.  We  can  hence  express  the  expected  moments 
of  S  in  terms  of  the  moments  of  £.  The  general  form  of  the  moments  of  S,  given  by 
Corollary  9.12  in  [68,  pp.143],  is 


£ 


(M+i-i+---+i3(afyi(nZfi 


it]  +2»2+3'3  ■+  1  -r  +  J'j  j 


'  li 


(j) 

l.U . V 


(4.15) 


where  y/i  r  is  the  multinomial  coefficient  given  by 


_ j! _ 

*1**2*  •  ■  •  tj!  U  +  1  -  (*i  +  h  +  ■  •  *  +  **))!’ 


(4.16) 


The  multinomial  coefficient  in  (4.10)  lias  an  interesting  combinatorial  interpretation. 
Let  j  a  positive  integer,  and  let  i\  t . . . ,  ij  gNu  {0}  be  such  that  i  \  +  2*2  +  * 1  *  +  jij  —  j* 
The  number  of  partitions  tt  €  NC(j)  which  have  i\  blocks  with  I  element,  *'2  blocks 
with  2  elements,  ...,  ij  blocks  with  j  elements  is  given  by  the  multinomial  coefficient 
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The  moments  of  S  are  related  to  the  moments  of  S  as 

Qj  —  co'J  for  j  =  1,2,...  (4.17) 

We  can  use  (4.15)  to  compute  the  first  few  moments  of  S  in  terms  of  the  moments 
of  £.  This  involves  enumerating  the  partitions  that  appear  in  the  computation  of  the 
multinomial  coefficient,  in  (4.10)  For  j  —  1  only  *j  =  1  contributes  with  7} 11  =  1.  thus. 

of  =  eaf  (4.18) 


For  m  ~  2  only  i\  =2,  i 2  =  0  and  i\  =  0,  *2  =  1  contribute  with 


J*)  _  1 

T2.0  -  ri 


J2) 

'0.1 


-  1, 


and  thus 

of  =  caf +c2(af)2  (4.19) 

For  rn  —  3  we  have  three  possibilities  for  the  indices,  contributing  with 

J3)  j  (3)  _  3  (3)  _  l 

73,0,0  m  7y,0  —  70,04  —  A* 

thus 

of  —  eaf  +  3e2af  af  +  e3(af  )'*  (4.20) 

For  m  =  4  we  have  five  possibilities  for  the  indices,  contributing  with 

J4)  _  T  W  _  6  (4)  _  2  (4)  _  4  (4)  _  J 

74,0,0,0  —  A>  72,1,0,0  —  70,2,0,0  “  7l, 0*1.0  “  4 4  7q,0,04  ~  1 


thus 


i_ ■  y  f)  y  y 

«ij  =  coj  +  4c‘eOtj  O3 


+  2c^(Q2)2  +  6cJ(of  )"J«2 


+  c\a  f)4. 


(4.21 : 


For  specific  instances  of  £,  we  simply  plug  in  the  moments  op  into  the  above 
expressions  to  get  the  corresponding  moments  of  S.  The  general  formula  in  (4,15)  can 
be  used  to  generate  the  expressions  for  higher  order  moments  as  well  though  such  an 
explicit  enumeration  will  be  quite  tedious  even  if  symbolic  software  is  used. 

An  alternate  method  is  to  use  the  software  package  RMTool  [73]  based  on  the 
“polynomial  method”  developed  in  the  second  part,  of  this  dissertation.  The  software 
enables  the  moments  of  S  to  be  enumerate  rapidly  whenever  the  moment  power  series 
of  £  is  an  algebraic  power  series,  i.e,,  it  is  the  solution  of  an  algebraic  equation.  This  is 
always  the  case  when  £  is  of  the  form  in  (4,2),  For  example,  If  0  —  (t  y .  <2,aj ,  a^a 3)  then 
we  can  obtain  the  moments  of  S  by  typing  in  the  following  sequence  of  commands  in 
MATLAHonce  RMTool  has  been  installed.  This  eliminates  the  need  to  obtain  manually 
obtain  the  expressions  for  the  moments  apriorh 


>>  startRMToo! 

>>  syms  c  tl  t2  al  a2  a3 
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»  number. of .moments  =  5; 

»  LmzSigma  =  atomLmz([al  a2  a3] ,  [tl  t2  l-(tl+t2)]); 
>>  LmzS  =  AtimesWish(LmzSigma,c) ; 

>>  alpha.S  =  Lmz2MomF (LmzS, number .of. moments ) ; 

>>  alpha.Stilde  =  c*alpha_S; 


■  4.3,2  Computation  of  covariance  moments  of  second  order  limit  distribu¬ 
tion 


Equations  (4.13)  and  (4,13b)  express  the  relationship  between  the  covariance  of  the 
second  order  limit  distribution  and  the  moments  of  S.  Let.  M(x)  denote  a  moment  power 
series  as  in  Notation  4-23  with  coefficients  a3.  Define  the  power  series  H{x)  —  xM(x) 
and  let 


(&m*))-imv))  1  \ 

\  (H(x)  -  H{y)f  (x  ~  y)7  J 


(4.22) 


so  that  Af00^,  y)  xyH(x,y).  The  {i,j)-th  coefficient  of  M°°(x^y)  can  then  be 
extracted  from  a  multivariate  Taylor  series  expansion  of  H(x,  y)  about  x  —  0,  y  —  0. 
From  (4.13),  we  then  obtain  the  coefficients  }  =  (2//?)q'VJ  *  .  This  is  best  done  using 
the  MAPLE  symbolic  package  where  the  following  sequence  of  commands  enumerates 
the  coefficients  afj  for  (}  —  L2  and  indices  i  and  j  such  that  i+j  <-  2  max.coef  f . 


>  vith(numapprox) : 

>  max.coeff  :=  S; 

>  H  :=  x  ->  x*Cl+sum{alpha[j] *x~2, j=l . . 2*max.coef f) ) : 

>  dHx  :  =  diff (H(x) ,x) :  dHy  dif f (H(y) f y) : 

>  H2  :=  simplify (dHx*dHy/(H(x)-H(y))~2-l/(x-y) ~2: 

>  H2series  :  -  mtaylor  (H2  ,  [x  ,y]  ,2*majc_coeff ) : 

>  i : =5 :  j  =2: 

>  M2_infty_coef  f  [i ,  j]  :=  simplify (coeff  Ccoeff  (H2series jX^i-l)  fy ,  j-D) : 

>  alphas. second[i,j]  : =  (2/beta) *M2_inf ty.coef f  [i f j]  : 

Table  4,2  lists  some  of  the  coefficients  of  A4°°  obtained  using  this  procedure.  When 
Oj  —  1  for  all  j  e  N,  then  otij  ~  0  a is  expected,  since  Gj  =  1  denotes  the  identity  matrix. 
Note  that  the  moments  ojT . . .  are  need  to  compute  the  second  order  covariance 
moments  Ojj  — 

The  covariance  matrix  Q  with  elements  —  a*j  gets  increasingly  ill-conditioned 
as  dim(Q)  increases;  the  growth  in  the  magnitude  of  the  diagonal  entries  a jj  in  Table 
4.2  attests  to  this.  This  implies  that  the  eigenvectors  of  Q  encode  the  information 
about  the  covariance  of  the  second  order  limit  distribution  more  efficiently  than  the 
matrix  Q  itself.  When  E  =  I  so  that  the  SCM  S  has  the  (null)  Wishart  distribution, 
the  eigenvectors  of  Q  are  the  (appropriately  normalized)  Chebychcv  polynomials  of  the 
second  kind  [64],  The  structure  of  the  eigenvectors  for  arbitrary  S  is,  as  yet,  unknown 
though  research  in  that  direction  might  yield  additional  insights. 
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Coefficient 

Expression 

0 

1 

CM 

a 

0:2,1 

“4  ttia2  4  2  a/  4  2a:\ 

1 6  a  \  2a2  -  6  a/  -  6  a  1 4  “8a  1  a;\  4  4  a4 

a3,l 

9ai2a2  -  6aiaa  “  3a22  4  3  a4  —  3a/ 

6  Or,  f  30  at  a/  -  42a/a2  -  18a2aa  +  12  at r>  4  24ai2a;{ 

—  12  aia4 

“18 a/  —  27a2a4  4  9ait  —  30ajt:  4  21  a/  4  36at3a4 

1 35  a  1 2a22  4  1 08  a  1  a2a:t  —  1 8  c *  3  or, 

-  72 a  13 ata  4  126 a/ <*2  — 

0:4 *1  l2o|f*22  -  lOa/og  —  80^0:1  4-  12 orj 2ft;t  -  8af]Q4  4  4  a/  4  4 

0:4,2  —  1 2  a/ - 24  0-204  +  8  On  -  20  a/'  4 16  <xg*  +  32  rt]2a4  -  56  a  \  :ia;t  +  88  orj  4a2  ~  96  n  /a?2  4 

8Qrt)£*2Gf;t  —  1 6  n  1  a.f, 

o4,y  96  aa2a;t  4  60  a  1 7  4  84  a  3  a./  4  432  *a22  4  1 80  a ]  4q2  -  48  a.\a4  4  12  a7  —  36  a2ar,  - 

24  4  144  Oia2a4  4  48  a/a*  -  96  0/04  —  156aia/*  -  300  a/a*  -  396a/a2aa 

£*4.4  —  UGoi*  —  76 as 4  —  48 0*102  4  256 0^0401  -  40 a/  4  16  as  —  G4aaar.  -  32  a  307  4 

1 408  a  1 J [020;i  —  33(1  a  1 2  a/  4  250  a }  4  a 4  4  1 44  n22a4  -  480  a  3  r'as  4  J  60  0  20  /  4  04  n  \  2  a*  — 
128oi;tor.  -  1440  a/a22  4  832  a/a/*  4  800  a /a2  -  7G8aia;/a;i  -  57fia/a2ri4  4 
192  ni  02OT, 


05, 1  —  5  a/  -  1 0  0204  4  5  oii  —  5  o  \ (i  4  5  a2:*  4  1 5  a  1 2a4  -  20  a  1  :iaa  4  25  ai  4a2  —  30  a  j  2n2  2  4 

30  a  1 020:1  -  lOaiOr. 

a*t2  60 a/aa  4  30 a*7  4  50 run/  4  240 a/ttj 2  4  1  lOa/aa  -  30a:ta4  4  J0a7  —  30 a2a*  ~ 

20O|O,;  4  100  a  j  a2a4  4  40  o  j 2 Of,  -  70  a /cm  -  OO01O2'1  -  1 60  a  ]  r,a2  -  240a/a2a;i 

“105 a/  -  60 02 4  -  45 arias  4  210&:*a4ai  -  30  a/  4  15oK  -  (lOa^a*  -  30 a  1 07  + 
1 140a/a2aa— 27Qa/a/4225a/a44  120a22CM  — 390  a /an  4135  a2a/  I  60a/na  — 
120  at  'a*  —  tl25a/a22  +  660  a/a/  +  6!5a/a2  —  G30aja/an  —  495a/a2a4  1 

I80ojO2ot, 

a*H 4  “900  nfj2tt4a;i  4  80  a /ay  —  16Ua/a(i  -  620a/a4  -  3200  a/a/1  4  700aja24  4 

3960a/a/  -  720a/ar,a2  4  1840a/a4a2  -  4100a/a;ia2  4  3000 n/a2Vi  - 
1 140oiO:t‘O2  4  1040  a/a/  -  440 o240;i  4  440 aaa4a2  4  240aiQoa2  4  320oiOr,Oa  - 
1020  ojos2^  4  20  Ofi  —  1 820  a/a2  4  1 80  a/a*  4  320  a /a*  4  180  at  a/  4  1 1 20  0  ifif  *  i  4 
80 04*  4  280  a/  —  40O]OK  —  60a7a2  —  80o;jO(;  “  100  a4  a* 

as,5  2400  a2ar,a  1 :t—  1 350  a/2  a*  a  1 4600  aaa*a2 4 300  a  \  0702  -  000  aija2a  1 2— 1 200  a^a^v  3  2  4 

400  a  1  (*<;»:<  4  3000a;ift4ttia  4  5 1 00  a /a /cm  4  1 2300  o/a2<M  4  5700  a/a2a/  4 
4400  a 3  a2Vi  4  400  a  j  4a,,  -  1 5000  a/o22a:<  -  5750  a/a2a4  -  200  a  [  *a7  4  500  a  1  n4nr,  4 
225  afla/  -  675  a/a/  -  3250  a/a/  -  625a/*a4  4  350a/a4  -  600  a  10/ 

1 050 a/a/  -  2800  a^v 3  7  -  1 1 550 a/a/  -  3300 a;ia4a ja2  -  800 cu>afJ  4  325 a.,2f*2 
4375al2a24  -  630^3 1,1  4  KH)a8ai2  -  75ar,2  4  255a/'  4  12U0Qai4a/  4  4550a/a2  4 
I550a/a4  4  25aM>  —  50a  1^1  -  75a2a^  -  100 a^a?  -  125a4a(i 


Table  4.2,  Relationship  betw^n  the  coefficients  =  Oj.,  and  a?. 
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■  4.4  Eigen-inference  algorithms 

■  4.4.1  Estimating  0  for  known  model  order 

Estimating  the  unknown  parameter  vector  0  follows  from  the  asymptotic  result  in 
Proposition  4.31*  For  large  since  is  (approximately)  normally  distributed  we 
can  obtain  the  estimate  0  by  the  principle  of  maximum- likelihood.  When  S  is  real,  Bai 
and  Silverstein  provide  a  formula,  expressed  as  a  difficult,  to  compute  contour  integral, 
for  the  correction  term  fig  in  (4.14)*  The  log-likelihood  of  v$  is  (ignoring  constants  and 
the  correction  term  for  the  mean  when  S  is  real)  given  by 

«  -  \g  Qg  1  V0  -  log  det,  Qfl,  (4.23) 

whidi  allows  ns  to  obtain  the  maximum-likelihood  estimate  of  0  as 


—  argmin  Qe  1  vg  +  log  det  Q#  for  q  =  dim(vg)  >  dim(d) 
0€@ 


(4.24) 


where  0  represents  the  parameter  space  for  the  elements  of  0  and  vg  and  Q*?  are 
constructed  as  in  Proposition  4.31. 

Canonically,  the  parameter  vector  0  of  models  such  as  (4.2)  is  of  length  2k  -  1  so 
that  q  —  dim(vg)  >  2k  —  1.  In  principle,  estimation  accuracy  should  increase  with  q 
since  the  covariance  of  \g  is  explicitly  accounted  for  via  the  weighting  matrix  Qg. 

Figure  4-2  compares  the  quantiles  of  the  test  statistic  y'gQgVg  for  dim(vg)  —  q  with 
the  quantiles  of  the  chi-square  distribution  with  q  degrees  of  freedom  when  q  =-  2,3  for 
the  model  in  (4.2)  with  0  —  (0.3,2. 1),  rn  —  n  for  m  =  40  and  in  =  320.  While  there  is 
good  agreement  with  the  theoretical  distribution  for  large  m,  n,  the  deviation  from  the 
limiting  result  is  not  insignificant  for  moderate  m.n.  This  justifies  setting  q  =  2  for  flit1 
testing  procedures  developed  herein. 

Hence,  we  suggest,  that  for  the  estimation  in  (4.24),  q  =  dirn(vg)  =  dim{6).  This 
choice  provide  robustness  in  low  to  moderate  dimensional  settings  where  the  deviations 
from  the  asymptotic  result  in  Theorem  4.27  are  not  insignificant.  Numerical  simula¬ 
tions  suggest  that  the  resulting  degradation  in  estimation  accuracy  in  high  dimensional 
settings,  from  such  a  choice,  is  relatively  small.  This  loss  in  performance  is  offset  by  an 
increase  in  the  speed  of  the  underlying  numerical  optimization  routine.  This  is  the  case 
because,  though  the  dimensionality  of  0  is  the  same,  the  matrix  Q  gets  increasingly 
ill-conditioned  for  higher  values  of  q  thereby  reducing  the  efficiency  of  optimization 
methods  . 


Quantiles  of  Input  Sample  Quantiles  of  input  Sample 


74 


CHAPTER  4.  STATISTICAL  EIGEN-INFERENCE:  LARGE  WISHART  MATRICES 


(a)  u  —  m  —  40.  (b)  n  =  m  =  320, 


(c)  n  =  rn  =  40.  (d)  n  —  m  =  320* 

Figure  4-2*  Numerical  simulations  (when  S  is  complex)  illustrating  the  robustness  of  (rat.  statistics 
formed  with  dhti(v)  =  2  to  moderate  dimensional  settings. 
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■  4.4.2  Testing  0  = 

Proposition  4.41.  Define  the  vector  vy  and  the  covariance  matrix  Qy  as 


Vy  = 


Tr  S  —  n  tv? 


SSJ  -  n  (a? +  £(<*)*) £ 


„  2 
Qfl  =  /j 


32  -  a? 


2a  i  +  23s  -  4a  i  3q 


23 j  +  2a3  —  45t52  4a^  -  S3\3^  —  63%  +  lOa^af  -  (>a  }_ 

ti/ii/t  0  =  1  (or  2)  when  S  is  veal  (or  complex)  and  3t  =  af  given  by 


Ol  =  —Ofi 
m 


n  T  n  .  ,  . 

a2  -  — +  “2  (ai  ) 
m  ml 


1Z\2 


n 


n 


3:3  =  —<*3  +  S-^afaf  + 

m  rrr  rrr 


n 


£^3 


34  -  -«f  +  44of«F  +  )M  +  4(«S4- 

m  m>  mr  mr  nr 


and  ay  —  (l/n)Tr  E*.  Thus,  for  large  n  and  rn ,  v&  ^  Jsf  (0.  Qfl)  so  that 

h{d)  :=  v^'Qg'vy  ~  xl 


(4.25a) 


(4.25b) 


(4.26a) 

(4.2Gb) 


(4.26c) 


(4.26d) 


Proof.  This  follows  from  Proposition  4.31.  The  correction  term  for  the  real  case  is 
discussed  in  a  different  context  in  [31].  □ 


We  test  for  0  =  do  by  obtaining  the  test,  statistic 

He0  ■  h(0o)  =  vLQoTe<> 


(4.27) 


where  the  vy0  and  Qy„  are  constructed  as  in  (4.25a)  and  (4.25b),  respectively.  We 
reject  the  hypothesis  for  large  values  of  Hq{) .  For  a  choice  of  threshold  7.  the  asymptot  ic 
convergence  of  the  test  statistic  to  the  x'i  distribution,  implies  that 

Prob.(ff*  =  1  \B  =  0Q)  »  (7).  (4.28) 


Thus,  for  large  n  and  m,  when  7  =  5.9914,  E’rob.fHy,  =  1| 6  —  do)  »  0.95. 
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■  4.4.3  Estimating  6  and  testing  6  =  6 


When  an  6  is  obtained  using  (4.24)  then  we  may  test  for  6  =  6  by  forming  the?  testing 
statistic  _  _ 


Hg:h(e)  =  u^w:1 


(4.29) 


where  the  ug,  and  are  constructed  as  in  (4.25a)  and  (4.25b),  respectively.  However, 
the  sample  covariance  matrix  S  can  no  longer  be  used  since  the  estimate  6  was  obtained 
from  it.  Instead,  we  form  a  test  sample  covariance  matrix  constructed  from  ((m/2)] 
randomly  chosen  samples.  Equivalently,  since  the  samples  are  assumed  to  be  mutually 
independent  and  identically  distributed,  we  can  form  the  test  matrix  from  the  first 
[(m/2)]  samples  as 

rf  1 

S  =  njn  XiX*  (4-30) 

1  2  I  j=| 


Note  that  will  have  to  be  recomputed  using  E^  and  c  =  n/\(rnj 2)].  The  hypothesis 
0  =  0  is  tested  by  rejecting  values  of  the  test  statistic  greater  than  a  threshold  7,  The 
threshold  is  selected  using  the  approximation  in  (4.28). 


■  4.4.4  Estimating  0  for  unknown  model  order 

Suppose  wc  have  a  family  of  models  parameterized  by  the  vector  0^K  The  elements 
of  0^1  are  the  free  parameters  of  the  model.  For  the  model  in  (4,2),  in  the  canonical 

case  0  =  {/ 1 . , . .  * ,  &k)  since  /]+...  j  +  tjt  =  1  so  that  dhn(0^ )  =  2fc  —  1. 

If  some  of  the  parameters  in  (4.2)  are  known,  then  the  parameter  vector  is  modified 
accordingly. 

When  the  model  order  is  unknown,  we  select  the  model  which  has  the  minimum 
Akaike  Information  Criterion  .  For  the  situation  at  hand  we  propose  that 

6  =  0a  where  t-  =  arg min  (  u~  ft)  W:'t)  ug(*,  +  log det  W^., )  +  2  diin(6{k) ) 

(4.31) 

where  ugik)  and  are  constructed  as  described  in  Section  4.4.,1  using  the  test 

sample  covariance  matrix  in  (4.110).  The  Bayesian  Information  Criterion  (B1C)  may 
also  be  used  for  model  order  selection.  It  would  be  useful  to  compare  the  performance 
of  these  two  criterion  in  situations  of  practical  interest. 


■  4.5  Numerical  simulations 

Let  E^  be  as  in  (4.2)  with  0  —  (ti,ai,a2}-  When  t\  —  0.5,  ai  =  2  and  (12  ~  1  then 
half  of  the  population  eigenvalues  are  of  magnitude  two  while  the  remainder  are  of 
magnitude  one.  Let.  the  unknown  parameter  vector  0  =  (t,a)  where  i  =  t  \  and  a  =  a\ . 
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Using  the  procedure  described  in  Section  4,3.1.  the  first  four  moments  can  be  obtained 
as  (here  c  =  nfm) 


of  =  1  +  t{a  -  1) 

=  (  —  2 clc  +  o2c ■+  c)  t“  4  (*-T  4  2 dc  —  2 c  +  a" )  t  Hb  1  -be 


(4.32a) 

(4.32b) 


tvf  —  (— 3'C3oa  4  aac2  —  c2  4  3ac2)  f3  4  (3  c2  +  3c2a2  —  3ac  -  6ac2  —  3  a2c  4  3  a3c  4  3  c)  f2 

4  (— 3 <?  4-  —  1 .  —  6 c  4-  3  ac  4  3  a2 c 4  3 ac2)  i  4-  1  4  c2  4  3 c  (4.32c) 


4  1  4  c3  +  6c  +  6ca  (4,32(1) 


Prom  the  discussion  in  Section  4.3.2,  we  obtain  the  covariance  of  the  second  order 
limit  distribution 


.  (4.33) 


where  0  —  1  when  S  is  real  valued  and  0  =  2  when  S  is  complex  valued. 

We  then  use  (4,24)  to  estimate  0  and  hence  the  unknown  parameters  t  and  <i. 
Table  4.3  and  4.4  compares  the  bias  and  mean  squared  error  of  the  estimates  for  a 
and  t  respectively.  Note  the  l/?i2  type  decay  in  the  mean  squared  error  and  how  the 
real  case  has  twice  the  variance  as  the  complex  case.  As  expected  by  the  theory  of 
maximum  likelihood  estimation,  the  estimates  become  increasingly  normal  for  large  n 
and  m.  This  is  evident  from  Figure  4-3.  As  expected,  the  performance  improves  as  the 
dimensionality  of  the  system  increases. 
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n 

m 

Bias 

Complex  Case 
MSE 

MSE  x  n2/100 

Bias 

Real  Case 
MSE 

MSE  x  »i2/100 

10 

20 

5 

10 

0.0455 

0.3658 

1.4632 

0.4862 

1.2479 

4.9915 

40 

20 

-0.0046 

0.1167 

1.8671 

0.2430 

0.3205 

5.1272 

80 

40 

-0.0122 

0.0337 

2.1595 

0.1137 

0.08405 

5.437 

160 

80 

-0,0024 

0.0083 

2.1250 

0.0598 

0.02084 

5,335 

320 

160 

0.0008 

0.0021 

2.1790 

0.0300 

0.00528 

5.406 

(a)  m  —  0.5m. 


11 

m 

Bias 

Complex  Case 
MSE 

MSE  x  n2/100 

Bias 

Real  Case 
MSE 

MSE  x  n2/100 

10 

10 

20 

20 

-0*0137 

0.1299 

0*5196 

0-2243 

0*3483 

1.3932 

40 

40 

-0*0052 

0.0390 

0.6233 

0.1083 

0*0901 

1*4412 

80 

80 

-0.0019 

0.0093 

0.5941 

0.0605 

0.0231 

1.4787 

160 

160 

-0*0005 

0.0024 

0.6127 

0.0303 

0*0055 

1.4106 

320 

320 

-0.0001 

0,0006 

0.6113 

0.0162 

nun  5  r. 

1*5155 

(1>)  rri  =  n. 


n 

m 

Bias 

Complex  Case 
MSE 

MSE  x  n2/100 

Bias 

Real  Case 
MSE 

MSE  x  7i2/100 

10 

20 

- 

20 

40 

-0.0119 

0*0420 

0.1679 

0.1085 

0.1020 

0*4081 

40 

80 

*0*0017 

0.0109 

0.1740 

0.0563 

IUI255 

0.4079 

80 

160 

-0.0005 

0.0028 

0,1765 

0.0290 

IU  10(13 

0*4056 

160 

320 

-0*0004 

0*0007 

0.1828 

0.0151 

0.0016 

0.4139 

320 

640 

0*0001 

0.0002 

0.1752 

0*0080 

0.0004 

0*4024 

(c)  tii  —  2  n . 


Table  4*3.  Quality  of  estimation  of  t  =  0  5  for  different  values  or  n  (dimension  of  oltservaiion  vector) 
and  m  (number  of  samples)  both  real  and  complex  case* 
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n 

m 

Bias 

Complex  Case 
MSE 

MSE  x  n2/l00 

Bias 

Real  Case 
MSE 

MSE  x  n2/100 

10 

5 

20 

10 

0.1278 

0.1046 

0,4185 

0.00748 

0.1024 

0,4097 

40 

20 

0.0674 

0.0478 

0,7647 

-0.01835 

0.04993 

0.7989 

80 

40 

0.0238 

0.01 1 1 

0,7116 

-0.02240 

0.01800 

1.1545 

160 

80 

0.0055 

0.0022 

0,5639 

-0,02146 

0.00414 

1.0563 

320 

160 

0.0007 

0.0005 

0,5418 

-0.01263 

0.00112 

1.1692 

(a)  m  =  0.5n. 


n 

tn 

Bias 

Complex  Case 
MSE 

MSE  x  n2/l00 

Bias 

Real  Case 
MSE 

MSE  x  n2/10O 

10 

10 

20 

20 

0.0750 

0.0525 

0.2099 

-0.0019 

0.0577 

0.2307 

40 

40 

0.0227 

0.0127 

0.2028 

-0.0206 

0.0187 

0.2992 

80 

80 

0.0052 

0.0024 

0.1544 

-0.0206 

0.0047 

0.3007 

160 

160 

0.0014 

0.0006 

0.1499 

-0.0126 

0.0012 

0.3065 

320 

320 

0.0003 

0.0001 

0.1447 

-0.0074 

0.0003 

0.3407 

(h)  m  —  it. 


n 

m 

Bias 

Complex  Case 
MSE 

MSE  x  na/100 

Bias 

Real  Case 
MSE 

MSE  x  n2/ 1 00 

10 

20 

20 

40 

0,0251 

0.0134 

0.0534 

-0.0182 

0,0205 

0.0821 

40 

80 

0.0049 

0.0028 

0.0447 

-0.0175 

0.0052 

0.0834 

80 

160 

0.0015 

0.0007 

0.0428 

-0.0115 

0,0014 

0.0865 

160 

320 

0,0004 

0.0002 

0.0434 

-0,0067 

0,0004 

0.0920 

320 

640 

0.0000 

0.0000 

0.0412 

-0.0038 

0.0001 

0.0932 

(c)  m  —  2 II. 


Table  4,4.  Quality  of  estimation  of  a  —  2  for  different  values  of  n  (dimension  of  observation  vector) 
and  m  (number  of  samples)  -  both  real  and  complex  case. 


Probability  Probability 


80 


CHAPTER  4  STATISTICAL  EIGEN-INFERENCE:  LARGE  WISHART  MATRICES 


(a)  a:n  —  320, m  —  640.  (b)  tin  =  320, n  =  640. 


(c)  an  =  320,  nt  —  640.  (Real  valued) 


(d)  /:  n  —  320,  m  —  640,  (Real  valued) 


Figure  4-3,  Normal  probability  plots  of  the  estimates  of  a  and  /  (true  values:  a  =  2.  /  =  0.5). 
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■  4.6  Inferential  aspects  of  spiked  covariance  matrix  models 

Consider  covariance  matrix  models  whose  eigenvalues  are  of  the  form  A]  >  A-j  >  . . .  > 
At  >  At+j  =  . . .  =  An  =  A.  Such  models  arise  when  the  signal  occupies  a  k -dimensional 
subspace  and  the  noise  has  covariance  A I.  Such  models  are  referred  to  as  spiked  covari¬ 
ance  matrix  models.  When  k  n,  then  for  large  n,  for  vy  defined  as  iu  Proposition 
4.31.  the  matrix  Qy  may  be  constructed  from  the  moments  of  the  (null)  Wishart  dis¬ 
tribution  [33]  instead,  which  are  given  by 


(4.34) 


where  c  —  njm.  Thus,  for  q  =  2,  Qy  is  given  by 
Qe  =  Qa  =  ^ 

This  substitution  is  motivated  by  Bai  and  Silverstein’s  analysis  [10]  where  it  is  shown 
that  when  k  is  small  relative  to  n,  then  the  second  order  fluctuation  distribution  is 
asymptotically  independent  of  the  “spikes.”  When  the  multiplicities  of  the  spike  is 
known  (say  1),  then  we  let  tj  =  1  jn  and  compute  the  moments  a*  accordingly.  The 
estimation  problem  thus  reduces  to 

6  —  arg  min  v#  1  vy  with  q  —  dtm(vy)  —  dim(O)  +  1  (4.36) 

0€© 


A2  c  2Aa(c+l)c 

2  A3  (c  +  1)  c  2  A4  (2c2  +  5c  +  2)  c 


(4.35) 


where  A  is  an  element  of  ft  when  it  is  unknown. 

Consider  the  problem  of  estimating  the  magnitude  of  the  spike  for  the  model  in 
(4.2)  with  /)  =  1/n,  and  a-i  —  1  known  and  a\  =  10  unknown  so  that  6  =  a  =  aj.  We 
obtain  the  estimate  6  from  (4.36)  with  A  —  1  wherein  the  moments  af  given  by 


q  —  1  +  Q .  +  Ti 

at  =  - — — 

n 


(4.37  a) 


= 


a2n  ~  2 pc  +  c  —  2  ac  4-  cn2  +  n2  —  n  +  2 pac  +  a2 c 


rr 


(4.37b) 


are  obtained  by  plugging  in  1  =  1/n  into  (4.32), 

Table  4,S  summarizes  the  estimation  performance  for  tills  example.  Note  the  1/n 
scaling  of  the  mean  squared  error  and  how  the  complex  case  has  half  the  mean  squared 
error.  The  estimates  produced  art!  asymptotically  normal  as  seen  in  Figure  4-4* 
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■  4.6.1  Impact  of  the  sample  eigenvalue  phase  transition  phenomenon 

Consider  testing  for  the  hypothesis  that  2  =  1.  For  the  model  in  (4.2).  which  is 
equivalent  to  testing  6  —  (1,1),  from  the  discussion  in  Section  4.4.2,  we  form  the  test 
statistic 

tfsph.  :  h{0)  =  vj)  Qg 1  vo  (4.38) 

where  Q#  is  given  by  (4,35)  with  A  —  I  and 


Tr  S  —  n 


Vtf  = 


n 

m 


where  c  =  n/m ,  as  usual.  Figure  4-5  compares  quantiles  of  the  test  statistic,  collected 
over  4000  Monte-Carlo  simulations,  with  the  theoretical  quantiles  of  the  \r>  distribution. 
The  agreement  validates  distributional  approximation  for  modest  values  of  n  and  m. 

We  set  a  threshold  7  —  5,9914  so  that  wc  accept  the  sphericity  hypothesis  whenever 
h{0)  <  7,  This  corresponds  to  the  95-th  percentile  of  the  distribution.  Table  4.6(a) 
demonstrates  how  the  test  is  able  to  accept  the  hypothesis  when  £  =  I  close  to  the 
0.95  significance  level  it  was  designed  for. 

Table  4.6(b)  shows  the  acceptance  of  the  sphericity  hypothesis  when  £  =  £  — 
diag(10, 1, . . .  *  1)  instead.  Note  how  when  71/m  is  large,  the  test  erroneously  accepts 
the  null  hypothesis  an  inordinate  number  of  times.  The  faulty  inference  provided  by  the 
test  based  on  the  methodologies  developed  is  not  surprising  given  the  phase  t  ransition 
phenomenon  for  the  sample  eigenvalues  described  by  the  following  result  due  to  Baik- 
Silverstein  [12],  Paul  [70]  and  others  [11], 


Proposition  4.61.  Let  S  denote  a  sample  covariance  matrix  formed  frvm  an  v  x 
m  matrix  of  Gaussian  observations  whose  columns  art  independent  of  each  other  and 
identically  distributed  with  mean  0  and  covariance  £,  Denote  the  eigenvalues  of  £  by 
Ai  >  A^  >  * . .  >  Xk  >  A*+ 1  —  , . .  ATi  —  A.  Let  lj  denote  the  j-th  largest  eigenvalue  of 
R.  Then  as  arm  — ►  ex?  with  crn  —  n/m  — *  c  €  (0.  ex). 


^  d-  ^  ^  if  Aj  >  A  ( 1  +  y/c') 


(4,39) 


[a  (1  +  >/^)J 

where  the  convergence  is  almost  surety . 


if  Aj  ^  A{  1  4-  y/7’) 


Since  the  inference  methodologies  we  propose  in  this  paper  exploit  the  distributional 
properties  of  traces  of  powers  of  the  sample  covariance  matrix.  Proposition  4.61  pin¬ 
points  the  fundamental  inability  of  the  sphericity  test  proposed  to  reject  the  hypothesis 
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£  =  I  whenever  (for  large  n,  m), 


A,  <  1  + 

V  m 

For  the  example  considered,  A]  =  10,  so  that  the  above  condition  is  met  whenever 
n/m  >  cf  =  81.  For  n/m  on  the  order  of  C(,  the  resulting  inability  to  correctly  reject 
the  null  hypothesis  can  be  attributed  to  this  phenomenon  and  the  fluctuations  of  the 
largest  eigenvalue. 

Canonically  speaking,  eigen-inference  methodologies  which  rely  on  traces  of  powers 
of  the  sample  covariance  matrix  will  be  unable  to  differentiate  between  closely  spaced 
population  eigenvalues  in  high-dimensional,  sample  sized  starved  settings.  This  impacts 
the  quality  of  the  inference  in  a  fundamental  manner  that  is  difficult  to  overcome. 
At  the  same  time,  however,  the  results  in  [12]  suggest  that  if  the  practitioner  has 
reason  to  believe  that  the  population  eigenvalues  can  be  split  into  several  clusters  about 
“t  ±  vW  m,  then  the  use  of  the  model  in  (4.2)  with  a  block  subspace  structure,  where 
the  individual  blocks  of  sizes  nj, . . . ,  ri*.  are  comparable  to  n,  is  justified.  In  such 
situations,  the  benefit  of  the  proposed  eigen- methodologies  will  be  most  apparent  and 
might  motivate  experimental  design  that  ensures  that  this  condition  is  met . 

■  4.7  Future  work 

In  the  development  of  the  estimation  procedures  in  this  chapter,  we  ignored  the  correc¬ 
tion  term  for  the  mean  that  appears  in  the  real  covariance  matrix  case  {see  Proposition 
4.31),  This  was  because  Bai  and  Silverstein  expressed  it  as  a  contour  integral  which 
appeared  challenging  to  compute  (see  Eq,  (1.6)  in  [10]).  It  is  desirable  to  include  this 
extra  term  in  the  estimation  procedure  if  it  can  be  computed  efficiently  using  symbolic 
techniques.  The  recent,  work  of  Anderson  and  Zeitouni  [5],  despite  its  ambiguous  title, 
represents  a  breakthrough  on  this  and  other  fronts, 

Anderson  and  Zeitouni  encode  the  correction  term  in  the  coefficients  of  a  power 
series  that  can  be  be  directly  computed  from  the  limiting  moment  series  of  the  sample 
covariance  matrix  (see  Theorem  3.4  [5]},  Furthermore,  they  have  expanded  the  range  of 
the  theory  for  the  fluctuations  of  traces  of  powers  of  large  Wish  art- like  sample  covariance 
matrices,  in  the  real  sample  covariance  matrix  case,  to  the  situation  when  the  entries 
are  composed  from  a  broad  class  of  admissible  non-Gaussian  distributions.  In  such  a 
scenario,  the  correction  term  takes  into  account  the  fourth  moment  of  the  distribution 
(see  Eq,  (5)  and  Theorems  3. 3-3,4  in  [5]),  This  latter  development  might  be  of  use 
in  some  practical  settings  where  the  non- Gaussian ity  is  well  characterized.  We  have 
yet  to  translate  their  results  into  a  computational  recipe  for  determining  the  correction 
term  though  we  intend  to  do  so  at  a  later  date.  The  numerical  results  presented  show 
the  consistency  of  the  proposed  estimators;  it  would  be  of  interest  to  establish  this 
analytically  and  identify  conditions  in  the  real  covariance  matrix  case,  where  ignoring 
the  correction  term  in  the  mean  can  severely  degrade  the  quality  of  estimation. 
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n 

m 

Bias 

Complex  Case 
MSE 

MSE  x  n 

Bias 

Real  Case 
MSE 

MSE  x  n 

10 

10 

-0.5528 

9.3312 

93.3120 

-0.5612 

18.4181 

184.1808 

20 

20 

-0.2407 

4.8444 

96.8871 

-0.2005 

9.6207 

192.4143 

40 

40 

-0.1168 

2.5352 

101.4074 

-0.0427 

4.9949 

199.7965 

80 

80 

-0.0833 

1.2419 

99.3510 

-0.03662 

2.4994 

199.9565 

160 

160 

-0.0371 

0.6318 

101.0949 

0.03751 

1.2268 

196.3018 

320 

320 

-0.0125 

0.3186 

101.9388 

0.04927 

0.6420 

204.4711 

(a)  m  —  n. 


11 

m 

Bias 

Complex  Case 
MSE 

MSE  x  n 

Bias 

Real  Case 
MSE 

MSE  x  » 

10 

15 

-0.3343 

6.6954 

66.9537 

-0.3168 

12.7099 

127.0991 

20 

30 

-0.1781 

3.2473 

64.9454 

-0.1454 

6.4439 

128.8798 

40 

60 

-0.1126 

1 .6655 

66.6186 

-0.08347 

3.2470 

129.88188 

80 

120 

-0.0565 

0.8358 

66.8600 

-0.02661 

1.6381 

131.04739 

160 

240 

-0.0287 

0.4101 

65.6120 

0.02318 

0.8534 

136.5475 

320 

480 

-0.0135 

0.2083 

66.6571 

0.02168 

0.4352 

139.2527 

(b)  m  =  ]  .5?*., 


11 

Ill 

Bias 

Complex  Case 
MSE 

MSE  x  n 

Bias 

Real  Case 
MSE 

MSE  x  n 

10 

20 

-0.2319 

4.9049 

49.0494 

-0.2764 

9.6992 

96.9922 

20 

40 

-0.1500 

2.5033 

50.0666 

-0.1657 

4.6752 

93.5043 

40 

80 

-0.0687 

1.2094 

48.3761 

-0.03922 

2.5300 

101.2007 

80 

160 

-0.0482 

0.6214 

49.7090 

-0.02426 

1.2252 

98.0234 

160 

320 

-0.0111 

0.3160 

50.5613 

0.01892 

0.6273 

100.3799 

320 

640 

-0.0139 

0.1580 

50.5636 

0.02748 

0.3267 

104.5465 

(c)  tn  =  2  7i. 


Table  4,5,  Algorithm  performance  for  different  values  of  n  (dimension  of  observation  vector)  and  n; 
(number  of  samples)  both  real  and  complex  case. 


Probability 
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(a)  n  —  320,.  m  =  640  (Complex  S). 


(b)  n  =  320t  m  —  640  (Real  S). 


Figure  4-4,  Normal  probability  plots  of  the  spiked  magnitude  estimate  (true  value  ==  10). 


Quantiles  of  Input  Sample  Quantiles  of  Input  Sample 
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(a)  n  —  40,  m  =  20  (Complex  S)> 


(b)  n  ~  320,  m  —  100  {Complex  S), 


Figure  4-5,  Sphericity  test. 
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m  =  10 

m  —  20 

ni  —  40 

m  =  80 

m  —  160 

in  =  320 

in  =  640 

n  =  10 

0.9491 

0.9529 

0.9490 

0.9465 

0.9489 

0.9508 

0.9498 

n  -  20 

0.9510 

0.9478 

0.9495 

0.9514 

0.9493 

0.9511 

0.9465 

ii  =  40 

0.9534 

0.9521 

0.9480 

0.9497 

0.9514 

0.9473 

0.9483 

n  =  80 

0.9491 

0.9457 

0.9514 

0.9547 

0.9507 

0.9512 

0.9489 

n  =  160 

0.9507 

0.9472 

0.9490 

0.9484 

0.9464 

0.9546 

0.9482 

n  =  320 

0.9528 

0.9458 

0.9448 

0.9509 

0.9479 

0.9486 

0.9510 

(a)  Empirical  probability  of  accepting  the  mill  hypothesis  when  £  I 


m  =  10 

m  =  20 

in  —  40 

m  =  80 

in  =  160 

m  =  320 

m  =  640 

n  =  10 

0.0009 

- 

* 

- 

* 

- 

- 

n  =  20 

- 

- 

- 

- 

- 

- 

- 

n  =  40 

0.0189 

- 

- 

- 

- 

- 

- 

n  =  80 

0.0829 

0.001 1 

- 

- 

- 

- 

“ 

n  =  160 

0.2349 

0.0258 

0.0002 

- 

- 

- 

- 

n  =  320 

0.4793 

0.1568 

0*0062 

- 

- 

- 

- 

(b)  Empirical  probability  of  accepting  the  null  hypothesis  when  £  =  Ex- 


Table  4*6.  The  null- hypothesis  is  accepted  at  the  95%  significance  level  for  or  whenever  h{0)  < 
5.9914. 
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Chapter  5 


The  Capon  beamformer: 
Approximating  the  output  distribution 


■  5.1  Introduction 

Given  a  set  of  independent  identically  distributed  (Lhd.)  signal  bearing  observations 

X  —  Jxi . xm]  where  each  vector  is  n  x  1  zero  mean  complex  circular  Gaussian, 

t.e.  x,  ^  CArri(0,  R),  i  =  1,2, .  ..  ,m,  Capon  proposed  a  filter-bank  approach  to  power 
spectral  estimation  in  which  he  suggested  the  optimal  design  of  linear  filters  that,  pass 
the  desired  signal  undistorted,  while  minimizing  the  power  from  all  other  sources  of 
interference  [19]. 

Formally,  when  the  n  x  n  data  covariance  matrix  is  given  by  R  and  the  assumed 
u  x  1  array  response  for  a  desired  signal  originating  from  angle  0  is  v(0),  the  solution 
to  the  following  constrained  optimization  problem 

ruin  w^Rw  such  that  v/Hv(0)  =  1  (5.1) 

w 

satisfies  the  minimum  variance  distortionless  response  (MVDR)  criterion  leading  to  the 
Capon-MVDR  filter 


wa/V'DH  =  R  ‘vW/v'^flJR  1  v((9) 
The  average  output  power  of  this  optimal  filter  is  given  by 

Prapon(O)  =  E 

leading  to  the  power  spectral  estimator 


\WMVDRX\2 


vll(0)R-lv(0) 


Pcapcm(V)  = 


1 


v"(0)R-*v(0) 


(5.2) 

(5.3) 


(5.4) 


where  R  -  (l/?a)XX,/  and  m  >  n  is  assumed.  This  estimator  results  when  R  replaces 
in  R  in  the  expression  for  w \tVDR  hi  (5.2),  and  this  filter  is  subsequently  applied  to 
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the  same  data  used  to  obtain  the  covariance  estimate  R. 

When  the  number  of  snapshots  is  greater  than  but  on  the  order  of  the  number  of 
sensors,  i,e..T  m  zz  n  the  sample  covariance  matrix  is  ill-conditioned.  Moreover,  when 
the  number  of  snapshots  is  less  than  the  number  of  sensors,  i.eM  m  <  n  the  sample 
covariance  matrix  is  rank  deficient  (singular).  In  both  of  these  scenarios  the  sample 
covariance  matrix  is  diagonally  loaded  with  a  loading  value  <5  (it  is  necessary  in  the 
latter  case)  to  yield  the  estimate 


Ra  =  —XX"  +  5I. 

m 


(5.5) 


The  justification  for  using  R*  in  place  of  R  even  when  m  >  n  is  the  observation  that 
doing  so  “robust  ifies”  the  signal  processing  [21,24,36].  The  covariance  matrix  estimate 
R,5  thus  formed  may  be  interpreted  as  a  structured  covariance  estimator;  in  statistics 
literature,  such  structured  estimators  are  encountered  in  the  context  of  shrinkage  based 
approaches  to  covariance  matrix  estimation  {e.g.,  [26,58]). 

Two  power  spectral  estimators  naturally  follow  from  this  modified  covariance  es¬ 
timate.  The  simplest  power  spectral  estimate  is  obtained  by  replacing  R  with  R(^  in 
(5*4)  yielding  the  expression 


Capon  $)  —  f*Capon 


1 

v"(0fc‘v(#)' 


(5.6) 


This  estimator  was  demonstrated  to  posses  inherent  robustness  properties  and  yield 
performance  commensurate  with  the  Multiple  Signal  Classification  (MUSIC)  algorithm 
[36].  ^ 

The  alternate  estimator  is  obtained  by  reformulating  Capons  approach  to  obt  ain 
the  constrained  optimization  problem 

minwwRtfW  such  that  wHv(0)  —  1.  (5*7) 


This  leads  t-o  the  filter 


WS  = 


The  average  output  power  of  this  filter  conditioned  on  R,j  in  given  by 


E 


x 


a 


v//(0)R^1RRjlv 

(0) 

v»(0)Ri'v(9) 

2 

(5.8) 


(5-9) 


Replacing  R  with  R,  in  (5.9)  yields  the  second  form  of  a  diagonally  loaded  Capon 
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spectral  estimator 


Pcapm(°<S)  =  -7 


v,'(D)Ra:iRR:'v(0) 

'^{0)^(0)]’ 


(5.10) 


Note  that 


(0.5)  OC  v"(0)Rv(0)  =  PBartletdO ). 


(5.11) 


as  the  diagonal  loading  value  increases,  this  adaptive  spectral  estimate  approaches 
its  conventional  beamfonning  counterpart,  known  as  the  Bartlett  spectral  estimator 
[  1 02] .  While  both  power  spectral  estimators  are  of  interest  to  the  array  processing 
community,  we  focus  on  the  estimator  of  the  form  in  (5.4)  in  this  chapter. 

■  5.2  Problem  formulation 

Consider  the  situation  where  the  i.i.d.  observation  vectors  x,  for  i  =  l,...,m,  dis¬ 
tributed  as  CM{ 0,  R),  have  covariance  matrix  of  the  form 


R  =  V(0}R,V{0)"  +<72I, 


where  the  n  x  k  matrix  V(0)  =  |v(0i), . . . ,  v(0j,)],  R,  is  the  k  x  k  covariance  matrix  of 
the  amplitudes  of  the  k  signals,  and  a~  is  the  variance  of  the  noise  process.  In  array 
processing  applications  this  models  a  situation  where  there  are  k  Gaussian  random 
sources  at  9\ , . . . ,  6^  with  array  manifold  vectors  v(fli  )T . . . ,  v(%)  and  we  can  treat  the 
observation  vector  x,  as  the  superposition  of  these  k  Gaussian  signals  embedded  in 
white  noise. 

The  manifold  vector  v(^)  associated  with  the  i-the  source  is  parametrized  by  the 
angular  location  of  the  source  with  respect  to  a  chosen  coordinate  system.  The  elements 
of  the  manifold  vector  encode  how  the  waves  (e.g,,  electromagnetic  or  acoustic)  impinge 
on  the  dements  of  the  sensor  array  The  manifold  (or  replica)  vector  captures  the 
degree  of  correlation  between  wavefronts  arriving  from  different  directions  at  the  various 
elements  of  a  sensor  array. 

The  geometry  the  relative  placement  of  the  sensors  on  the  array,  and  the  prop¬ 
agation  characteristics  of  the  operating  medium  thus  play  an  important  role  when 
determining  the  dependence  of  the  manifold  vector  on  the  direction  of  arrival.  This 
dependence  on  the  direction  of  arrival  9  can  be  explicitly  represented  for  many  array 
configurations  [102.  Chapters  2-4] .  The  simplest  array  configuration  is  the  uniform  lin¬ 
ear  array  depicted  in  Figure  5-L  Here,  as  the  name  suggests,  the  n  sensors  are  placed 
uniformly  along  a  line.  The  manifold  vector  for  this  configuration  is  the  n  x  1  vector 
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Source 

m 


Figure  5-1.  Configuration  of  a  uniform  linear  array  with  n  sensors  and  inter-elements  spacing  of  d 
units. 


where  d  is  the  inter-element  spacing  of  the  sensors  and  £  is  the  wavelength  of  the 
propagating  wave  (same  units  as  k )  so  that  determining  ratio  d/f  is  the  inter-element 
spacing  hi  wavelengths. 

When  the  manifold  vector  vector  is  known,  the  Capon  power  spectral  estimator  can 
be  used  to  detec  t  the  number  of  signals  in  white  noise.  The  Capon  estimator  Pcapmifi) 
is  an  estimate  of  the  spatial  power  spectrum  as  a  function  of  the  scan  angle  B.  The 
number  of  signals  present  can  be  estimated  by  scanning  the  angle  space  and  determining 
t  lit'  number  of  peaks  obtained. 

This  is  illustrated  in  Figure  5-2  where  the  theoretical  power  spectral  estimate 
Pcapon{B)  in  (5.3)  is  compared  with  the  estimates  Pcapan(B^)  formed  using  (5.4)  for 
S  —  0  and  S  —  10  when  n  =  m/2  —  18,  Here  the1  observation  vectors  were  sampled  in 
the  scenarios  where  k  —  2  and  the’;  two  (independent)  sources,  Rs  =  diag{p\ ,  (72), 
where?  a\  =  —  100  with  a2  =  1,  B\  “  90°,  02  =  70°,  and  d(ft  —  0.45*  As  in  Figure'  5-2, 

the  sources,  will  (generally)  manifest  as  peaks  in  the  spatial  power  spectrum  estimate. 

Note  that  underlying  setup  is  identical  to  that  considered  in  Chapter  3;  however,  un¬ 
like  the  eigen- inference  solution  proposed  in  Chapter  3,  the  Capon-MVDR  beamformer 
exploits  information  about  the  eigenvectors  of  R  encoded  by  means  of  the  manifold  vec¬ 
tor  v(0),  Consequently,  provided  there  is  no  mismatch  between  the  assumed  manifold 
vector  and  the  true  manifold  vector,  the  Capon-MVDR  beamformer  should  be  able  to 
identify  signals  wit  h  power  levels  below  the  identi liability  threshold  in  Section  3.7. 

Detecting  the  number  of  sources  from  the  spatial  power  estimate  is  challenging 
because  the  estimate  Pcapon(B,6)  is  a  random  variable  that  is  a  funct  ion  of  the  random 
sample  covariance  matrix  R,  which  has  the  (complex)  Wishart  distribution.  Thus  it 
becomes  important  to  characterize  the  distribution  of  the  output  Pcapan{&i&)  and  its 
dependence  on  Pcapon{$)  <  the  loading  level  6,  the  number  of  sensors  n,  and  the  sample 
size  m.  This  can  facilitate  the  judicious  selection  of  thresholds  for  testing  the  hypothesis 
that  a  signal  is  present  while  controlling  the  false  discovery  rate. 
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Figure  5-2.  The  Capon-MVDR  spectral  estimate  when  n  —  IB,  m  —  3(>. 

In  the  situation  when  m  >  n  and  S  =  0,  the  distribution  of  the  outputs  is  given 
by  the  famous  Capon- Good  man  result  [20]  which  we  shall  revisit  in  Section  5.3.  The 
result  captures  the  bias  and  the  variance  in  PcajKm{@)  relative  to  Pcapon[@)  (the  bias 
can  be  seen  in  Figure  5-2)  due  to  finite  sample  size.  The  corresponding  question  for  the 
situation  when  the  Capon-MVDR  beamformer  is  diagonally  loaded,  ie.,  when  6  >  0 
and  general  n  and  m,  lias  remained  outstanding  in  the  array  processing  literature  for 
over  four  decades.  Baggeroer  and  Cox  emphasize  its  importance  and  the  analytical  void 
in  [7,  pp.  105].  In  their  words: 

“When  using  a  limited  number  of  snapshots  and  diagonal  loading,  significant 
biases  are  introduced  which  can  be  misleading  vis  a  vis  the  level  where  a 
weak  signal  can  be  detected.  The  Capon-Goodman  formula  is  valid  only  for 
the  case  of  no  loading  with  m  >  n  which  is  typically  not  the  case  for  sonars 
. , .  Except  for  the  very  special  case  of  a  single  snapshot,  we  are  not  aware 
of  any  analytic  results  for  the  bias  when  rn  <  n  and/or  loading  is  applied 
even  when  m  >  n.” 

In  this  chapter,  we  solve  this  problem  by  providing  analytical  expressions  for  the  bias 
for  when  m  <  n  and  m  >  n  and  a  loading  value  of  S  is  applied.  We  provide  stochastic 
approximations  for  the  distribution  of  i?Capon(®->  5)  —  1  /Pcapon{$i$)  &  function  of  n, 
m,  and  6  for  the  situation  where  there  are  no  sources,  a  single  source,  and  two  sources 
(or  a  source  and  an  interferer).  The  results  apply  for  arbitrary  array  configuration  and 
include  the  manifold  vector  mismatch  case. 
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In  principle,  the  results  can  be  extended  to  approximate  the  distribution  of  the 
outputs  when  there  are  an  arbitrary  number  of  sources  in  white  noise.  However,  as  we 
shall  shortly  see,  there  will  be  an  accompanying  combinatorial  explosion  in  the  number 
of  terms  needed  to  obtain  an  “accurate"  approximation.  When  the  expressions  start 
becoming  that  cumbersome  to  write  down,  it  is  perhaps  reasonable  to  question  what, 
if  any,  analytical  insight  they  yield  that  can  help  the  user  in  practical  matters  such 
as  determining  the  “optimal"  diagonal  loading  value  or  compensating  for  the  induced 
biases.  We  suggest  that  extensions  of  this  work  focus  on  using  the  relatively  simpler 
approximations  in  the  canonical  two  or  less  signals  in  white  noise  scenario  to  piece 
together  a  usable  approximation  when  there  are  more  than  two  signals.  This  is  a 
matter  we  shall  defer  to  a  later,  more  thorough  investigation. 

We  note  that  in  contrast,  the  distribution  of  Pcap<m[&i&)  hi  the  sidelobes,  i,e,,  for 
values  of  6  that  are  not  in  the  proximity  of  the  true  signal  directions  can  be  approximated 
by  a  Normal  distribution  with  mean  and  variance  that  arc  related,  in  closed  form,  to  the 
loading  value  <5,  the  number  of  sensors  n  and  the  number  of  snapshots  m.  The  sidelobe 
level  distribution  st  atistic  is  hence  likely  to  be  of  greatest  utility  to  the  practitioner  since 
it  facilitates  the  setting  of  sidelobe  level  thresholds  that  avoid  false  signal  discovery. 

The  remainder  of  tins  chapter  is  organized  as  follows.  We  review  the  Capon* 
Goodman  result  for  the  case  when  S  —  0  and  m  >  n  in  Section  5.3  with  the  objective  of 
identifying  what  makes  it  difficult  to  extend  their  analysis  to  the  situation  when  ri  /  0. 
The  relevant  result  from  random  matrix  theory,  due  to  Jack  Silverstein  [81],  is  isolated 
in  Section  5,4  and  applied  in  Sections  5,5,  5.6,  and  5.7  to  characterize  the  distribution 
of  the  Capon-MVDR  beamformer  outputs  where  there  is  no  source,  a  single  source  and 
two  sources  in  white  noise,  respectively*  The  results  are  validated  using  numerical  simu¬ 
lations  in  Section  5*8;  extensions  and  directions  lor  future  research  art*  briefly  discussed 
in  Section  5*9 . 


■  5.3  The  Capon-Goodman  result 

When  5  —  0  and  tn  >  n  the  distribution  of  Pcap<m(Q)  (when  appropriately  normalized) 
is  equal  to  a  chi-squared  distribution  with  2 (m  —  n  -j-  1)  degrees  of  freedom  (this  is 
equivalent  to  a  so-called  complex  chi-squared  distribution  with  m  —  n  +  1  degrees  of 
freedom).  4' his  is  implicitly  stated  in  the  result  that  follows. 

Proposition  5*31.  Let  R  be  the  sample  covariance  matrix  formed  from  m  indejwndenl , 

identically  distribution  complex  valued  observation  vectors  x\ . xm  where t  for  each 

i  =  1 —  |,m.  x  —  CM (0.  R).  When  m  >  nf  and  if  Probes.  —  0)  =  0  then , 


a/;R  !a  2 

"R  'a  ~  *2(m~n+] 


PROOF,  The  statement,  in  the  real  valued  case,  follows  from  Theorem  3.2.12  in  [67,  pp. 
96]*  The  complex  Case  was  derived  by  Capon  and  Goodman  in  [20].  □ 


h  z.  the  capoinlgoodman  result 
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From  Proposition  5.31,  an  application  of  the  well-known  formulas  for  the  mean  and 
variance  of  the  chi-squared  distribution  leads  to  the  famous  Capon  and  Goodman  ex¬ 
pressions,  in  (5.14a),  for  the  mean  and  variance  of  the  Capon-MVDR  spectral  estimator. 
The  mean  and  variance  of  rp{8,  <5)  is  computed  as  well  using  the  properties  of  the  inverse 
chi-squared  distribution. 

Corollary  5.32,  Assume  Rt  as  in  Proposition  5.3  L  is  an  estimate  of  the  true  covari¬ 
ance  matrix  FL  Let  Pcap<m{&)  and  Pcaponi 9)  =  Pcaptm{®*0)  be  defined  as  in  (5-3)  and 
(5.4)  respectively .  Then ?  assuming  m  >  n. 


Define  IpCajxm 

rt  +  2, 


E  [Pcap^W]  -  - PcapoAO) 

Pcapon(8)]  = 


rn 

rn  —  n  + 1 


P Capon(®)~  * 


(8)  =  1  /Pcapon{8)  and  i?Capwi{&)  =  ]/PCap<m{Q)- 


(5.14a) 

(5.14b) 

Then%  assuming  rn  > 


1  ffi 

E  &  Capon  i$)  I  =  '  _  ^Pcaponi®) 

j  m  — ”  ?? 


var 


-0CaPO„(0)]  -  {m_„ 


rn 


02(m  n  —  1) 


'upon  (0)* 


(5,15a) 

(5,15b) 


Equation  (5.14a)  captures  the  degradation  in  the  quality  of  the  power  spectral  es¬ 
timate  due  to  sample  size  constraints.  Our  objective  is  to  mimic  the  Capon-Goodinan 
result  by  characterizing  the  distribution  of  the  Capon-MVDR  outputs  for  the  case  when 
d  >  0  and  m  <  n.  Before  we  do  so,  we  revisit  the  special  case  when  5  =  0  with  the  goal 
of  identifying  the  key  property  that  allows  us  to  characterize  the  output  distribution 
as  simply  as  in  Proposition  5.31.  This  will  provide  insight  into  why,  when  6  /  0,  the 
characterization  has  eluded  researchers  for  over  four  decades. 


■  5.3.1  Structure  exploited  in  Capon  and  Goodman's  analysis 

We  first  consider  the  scenario  with  no  diagonal  loading,  i.e,,  =  0.  The  inverse  of  the 

MVDR  beamformer  output  is  then  given  by 

tcapon{&)  =  t hapon(0,0)  =  v"(0)R-]v(0).  (5.16) 

Assume  that  m  >  n  so  that  the  sample  covariance  matrix  is  not  singular  and  can  be 
decomposed  as 

R  =  R^2W(c)R1/2,  (5.17) 

where  R  is  the  true  covariance  matrix  and  W(c)  {here  and  henceforth)  is  the  complex 
Wishart  random  matrix  with  identity  covariance.  We  parameterize  the  Wisliart  matrix 
by  c  —  n/rn  which  is  the  ratio  of  the  number  of  sensors  to  snapshots. 
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When  there  is  no  diagonal  loading,  in  (5*  16),  can  be  decomposed  as 

$cap<m{8)  =  vH(d)R-\($)  uH(6)W{c)-lu(0)  (5.18) 

V  _  N.  _ 

Deterministic  term  Stochastic  term 

where  u(0)  is  a  unit  vector  such  that  R_lyf2v{tf)  =  a(0)  u(fl)  and  |or(fl)’|2  =  ipCapon(&)  = 
vIJ{8)R-lv(6). 

Recall  that  Weapon (0)  =  1  /Pcapon(&)  where  Pcaptm (0)  is  the  spectral  estimate  when 
the  true  covariance  matrix  R  is  known.  The  stochastic  term  is  a  quadratic  form  involv¬ 
ing  the  Wishart  matrix.  Thus,  ;is  a  function  of  8.  when  there  is  no  diagonal  loading, 
the  probability  distribution  of  the  MVDR  beamformer  is  completely  characterized  by 
the  single  stochastic  term  in  (5.18)  which  has  an  inverse  chi-squared  distributions  from 
Proposition  5.31. 

In  essence  the  decoin posabi lit y  of  the  quadratic  form  into  the  stochastic  and  the 
deterministic  components  is  exploited  in  the  derivation  of  the  chi-squared  distribution 
for  Pcapon  ($)  in  the  famous  Capon-Goodman  paper  [20].  The  ability  to  do  so  implies 
the  true  covariance  matrix  R  appears  in  the  solution  only  in  the  form  of  a  deterministic 
scale  factor  as  in  (5.18).  This  means  that  the  relative  bias  and  variance  of  the  outputs 
will  be  identical  across  the  entire  scan  angle  space  as  demonstrated  as  can  be  seen  in 
(5.14a).  More  importantly  the  distribution  thus  computed  applies  for  arbitrary  R  so 
that  the  model  in  (5.12)  is  merely  a  special  case. 

When  the  Capon-MVDR  processor  is  diagonally  loaded,  it  is  no  longer  possible  to 
decouple  the  stochastic  part  from  the  deterministic  part.  In  particular,  the  distribution 
of  will  explicitly  depend  on  the  structure  of  the  true  covariance  matrix  and 

the  approximations  we  develop  will  only  apply  for  the  model  in  (5.12). 


■  5.4  Relevant  result  from  random  matrix  theory 

The  distributional  approximations  for  i£(0,  £)  that  we  shall  develop  rely  on  the  following 
asymptotic  characterization  of  quadratic  forms  of  functions  of  complex  Wishart  matri¬ 
ces  with  identity  covariance. 


Proposition  5.41,  Let  u  and  be  two  fixed  mutually  orthogonal  n  x  1  unit  vectors. 
Let  W*(c)  =  (W(c)  +  S  ln )  where  W(c)  is  a  complex  Wishart  matrix  with  covariance 


identity*  Then,  as  i 

i,  m  co?  with  nj  rn  — ►  c  >  0, 

y/ri  (uwW^(c)n  —  fts)  ^  <n  ~V{0 ,  erf) 

(5.19a) 

yXi  (uHW Jl(c)u±)  q2~CM{G,aj/2) 

(5.19l>) 

y/B  «  W g  1  {c)u±  -  w)  q3  ~  JV{0,  S 

(5.19c) 

5.4  RELEVANT  RESULT  FROM  RANDOM  MATRIX  THEORY 
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where  the  convenience  in  distribution  is  almost  surely  and 


VS  ~ 


°6  = 


-1  +  e  -  S  +  v/1  -  2c+2CC?  +  2<;tf  +  <52 

2c  6 

dvs  2 
~~ds-^ 


(5.20) 

(5-21) 


Proof.  The  result,  follows  from  an  extension  [81]  of  the  techniques  developed  by 
Silverstein  in  [84,85].  The  mean  and  variance  are  obtained  by  evaluating  the  integrals 


«-  / 


(5.22a) 

(5.22b) 


wliere  dFu  is  the  MarcenkoPastur  density  in  (3*15).  □ 

Thus,  for  large  enough  71  and  m  ,  Proposition  5,41  suggests  that  we  can  approximate 
the  quadratic  forms  r\  —  u//W(j“1(c)u,  7*2  —  u^W^fcjui  and  r%  —  u^W^l(c)uj_  by 
independent  Gaussian  random  variables  where  rj  and  are  identically  distributed  real- 
valued  Gaussian  random  variables  with  mean  and  variance  a|/n  and  is  a  complex* 
valued  Gaussian  random  variable  whose  real  and  imaginary  parts  are  independent  and 
identically  distributed  Gaussian  random  variables  with  mean  ft$  and  variance  (j|/(4n}. 
In  deriving  the  distributional  approximation,  whenever  we  encounter  such  quadratic 
forms  formed  from  orthogonal  unit  vectors,  we  shall  replace  them  with  independent 
normally  distributed  variables  as  in  Proposition  5,41, 

The  accuracy  of  this  asymptotic  approximation  even  when  n  and  m  are  of  moderate 
size  can  be  discerned  by  comparing  (5*15)  with  the  result  obtained  using  Proposition 
5.41*  From  (5.18)  and  (5,19a)  we  have 


dcepanW)  =  v" (0)R- 1  v{0)  u"w^](c)u  =  ^capm(0) r,  (5.23) 


where 


1  1  c 

1  -  c 1  n  (1  -  c):i 


so  that  we  obtain  the  approximations 


E 

var 


ipCapon  (®)  j 

Capon  ( ^  )  j 


72 

'  WCapon  ( $ ) 

m  —  r;  1 

1  \‘t  *PCapon{$Y 

(rn  -  n)6 


(5.24) 


(5.25a) 

(5.25b) 


where  we  have  substituted  c  —  n/m  in  (5.24).  Comparing  (5.15)  with  (5,25)  shows  the 
accuracy  of  the  approximation  used. 
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■  5.5  Distribution  under  diagonal  loading:  No  sources 

Consider  the  null  hypothesis  -  where  there  arc  no  sources  in  white  noise  of  variance 
cr“  =  1  so  that  R  =  I  We  assume  that  diagonal  loading  is  applied,  i.c.,  >  0.  The 

statistic 

Weapon (M)  =  v"(0)  ( W(c)  +  n„)-]  v(0)  (5.26) 

can  be  approximated  using  Proposition  5,41  as  follows.  Figure  5-3  validates  the  approx¬ 
imation  for  moderate  rn  and  n*  This  approximation  can  also  be  used  in  the  sidelobe 
region  when  there  are  many  sources  in  white  noise. 


Approximation  5*51  (No  sources/sidelobe  region). 

£!{e- 5)  =  n  ~  A^dl  v(0)  |j2  /iff,  ||  v(0)  Id  a'l/n)  (5.27) 


■  5.6  Distribution  under  diagonal  loading:  Single  source 


Consider  the  scenario  where  there  is  a  single  source.  Assume*  without  loss  of  generality, 
that  a2  =  1  so  that  the  covariance  matrix  R.  =  a*gV(8s)v(@s)if  +  I  where  tts  is  the 
direction  of  the  source  and  cr|  is  the  corresponding  source  power.  Given  a  vector  v(0), 
we  construct  the  u($)  as 


u(0)  = 


v(g) 

|[v(0)|f 


(5-28) 


Note  that.  u(0s)  is  an  eigenvector  of  R.  The  covariance  matrix  R  can  hence  be  decom¬ 
posed  as 

R  ={a  +  1)  u(es)uH(0s)  +  Ul(0s)U?(05),  (5.29) 

where  n  —  tr|  ||  v(0)  ||2  and  Uj_($s)  i-s  an  n  x  (n  —  ])  matrix  orthogonal  to  u(%')  such 
that  U  d  {&s )  U x  {&s )  —  I„—  i  -  Hence,  we  have 


R“ 1  =  — u(6s)uh{8s)  +  U  i(Os)Ull(0s),  (5.30) 

so  that 

${e,S)  =  vH(0)  Rj'viO).  (5.31) 

We  can  rewrite  (5.31)  as 

$(6,S)  -  v%)R-2  (W(c)  +  £  R  l)"J  R -3v{0).  (5.32) 


Probability  Probability 
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(a)  n  =  36,  m  =  18,(5  =  10 


(b)  n=  108,  m  =  54,  S  =  10 


Figure  5-3.  Assessing  the  validity  of  the  approximated  distribution  for  \j’(0,S)  when  there  are  no 
sources  in  white  noise. 
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Lei  us  first  consider  the  matricial  term  (W(c)  +  ^R  -')  ,  which  may  be  rewritten  as 


(W(c)  +  (iR_])  1  =  (W(c)  +  61  +  SR-'  -  I) 


-1 


-1 


=  I  W5(c)±SR-]  -51 


D 


From  (5*30),  we  note  that  the  matrix  D  is  a  rank-one  matrix  of  the  form 


D  =  5 


(ttt  ■ ') 

V  ^ 

1/d 


(5.33) 


Using  the  Sherman-Morrison- Woodbury  matrix  inversion  lemma  [41],  we  have 


(W(c)  +  (5R-')  1  =  Wjl(c)  - 


wil(c)u(es)ul,(es)xv;l(c) 


d+uH(es)W^(c)u($s) 

In  (5.32),  the  term  can  also  be  written  in  terms  of  u(#s)  as 


(5.34) 


v;/(#)R-5  =||v(0)  || 


(u(0)'U{Os))  ,, 


u"(9s)+  ||v{0)||  <u(0),Uj.(fe))U£(0s)  (5.35 


\/<i  +  1 


or,  equivalently  as 

v"(fl)R  =  0uH(0s)  +  ?U "(*s),  (5.36) 

where  0  =||v(ff)||  (u(fl).  u(%))/\/o  +  1.  and  ui(fls)  is  an  i\  x  1  unit  vector  such  that 
7Ui(fy  s)  is  equal  to  the  second  term  on  the  right  hand  side  of  (5.35). 

On  substituting  (5.34)  and  (5.36)  into  (5.32),  and  performing  some  algebraic*  manip¬ 
ulations  we  obtain  the  expression  in  (5.37)  below.  Applying  Proposition  5.41  gives  us 
the  stochastic  approximation  for  the  distribution  of  V;(0,  5)  composed  using  independent 
normally  distributed  random  variables  as  described  below. 
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Approximation  5.61  (Single  source  in  white  noise). 


where 


n  =  uf/(0s)WfJ(c)u(0s)  EcM(nstaj/n) 

T2  =  u'/(0s)WJ1(c)u_l(0,s)  ~  CM (0,  og/2n) 
r-A  =  u"(0.s)Wj HcJul^s)  ^Mitu.al/n) 


(5.38a) 

(5.38b) 

(5.38c) 


It  is  worth  noting  how  much  more  complicated  the  structure  of  (5.37)  is  compared  to 
(5.27).  This  is  evidence  of  the  fact  that  when  diagonal  loading  is  applied,  the  probability 
distribution  of  the  outputs  depends  in  a  more  complicated  manner  on  the  underlying 
structure  of  the  covariance  matrix  R.  Nonetheless,  (5.37)  is  an  exact  expression  for 
tV'a/Km(0.  £)  and  the  approximated  stochastic  representation  relies  on  treating  the  vari¬ 
ables  r  j,  r2,  and  j'a  as  independent  Gaussian  random  variables.  Though  the  distribution 
does  not  have  a  nicely  expressable  density  function,  we  can  efficiently  sample  from  it 
using  this  approximate  stochastic  representation. 


■  5.7  Distribution  under  diagonal  loading:  Two  sources 

When  there  are  sources  in  white  noise  of  variance  a2  —  1  we  have 


R  =  [v(#i)  v(02)]Rs[v('0i  )  v(02)]w  + 1 

=  [u(0])u(02)]  ^  [i»(0I)U(0a)]//+I 


where  Uj  and  u2  are  the  eigenvectors  corresponding  to  the  two  hugest  “signal"  eigen¬ 
values  of  R.  The  inverse  covariance  matrix  is  given  by 


(5.41) 
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so  that  the  matrix  D  —  R  1  —  8 1  given  by 


D  =  +  (^rj  -  >)  "3“?  + 1  <r--42> 


l/di 

=  [U]  112] 


1/d,  0 

0  l/d'j 


I /da 


|ll]  "2]", 


(5.43) 


is  a  rank-two  matrix.  Applying  the  Slierman-Morrisoii-Woodbury  matrix  inversion 
lemma  to  (Wy(c)  +  D)~]  we  have 


(WafcJ  +  D)"1  =  W^c)-1  -  WJ^aiwjUalTr^uiUa]", 


where 


'T-l  - 

A<5  “ 


d  1  0 ' 

0  d2 


+ 


U{,W,-1(C)U|  u{;WflT'(c)u,' 
a^W-'^u,  uf/W^(c)u2 


(5.44) 


(5.45) 


To  simplify  the  analysis  we  assume  that  6  is  large  enough  so  that  ufWi 1  (<■ )  1 1 1 
irfwr'fcju,  so  that  the  approximation 


1 


1 


d,  +u{/W,-1(c)Ul 
0 


0 

1 


<h  +  i<'fW-V)'h. 


(5.46) 


holds.  Substituting  (5.46)  into  (5,44)  we  have 


(Wrf(c)  +  D)"'  «  W^‘(r)- 


W^cjumj'W^c)  W.-'(r)ii2uVW^1(r)  ^ 

d,  +  u"W5-l(c)u,  d2  +  u//W-'(e)u, 


(5.47) 


Assume  that  we  can  decompose  (#)R  5  as 


vH(0)R"4  =||v(0)|| 


(u(g).uQ? 

Veil  +  1 


uf+  ||v(f)|| 


(n(g)tUa))  h 

\/o2  +  1  2 

+  II  v(^)  ||  <u(*VUJ.))Uj' 


(5.48) 


so  that  we  have 

vW(0)R~  i  =  0,  u[‘  +  02uf/  +  7u",  (5.49) 

where  /?i,  f3o  and  7  are  non-random  parameters  obtained  by  comparing  the  represen¬ 
tations  in  (5*48)  and  (5.49),  term  by  term.  Once  these  parameters  are  computed  for 
the  non-random  ii|  and  U2  given  by  (5*39)  we  can  apply  Proposition  5.41  to  obtain  the 
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approximated  stochastic  representation  for  <5)  given  below. 


Approximation  5.71  (Two  sources  in  white  noise) 

\di+n  (h  +  uj 


+  2  Be 


|7|2  (^6  - 

(av  [ 


d’2  +  T4 

lr:ii~  _  Ins  I2 

di  +  t’i  d2  +  r. 

d\  r;< 

d\  +  r 


+  \ShY 


{  (hu  _  N2  \ 

\d2  +  r4  di  -f-  t-i  / 


d2  +  T4 

-)+27?e 


d\  r; 


r2r4 


d)  +  T| 
d‘2  r5 


d-2  +  r4  d\  + 1 


±-D 

d2  +  r4\J 
T 2  r3  j  ^ 


ri  =  ufWj  1  {c)uj  ~^/’(pj,a|/n)  (5.51a) 

r2  =  ul'Wj  V)”*  ~  CjV(0,a|/2n)  (5.51b) 

r3  =  u{YWrf-'(C)uj.  icM(0,aj/2n)  (5.51c) 

r4  -  u^W^1(c)u2  ~N{fi&,(Ts/n)  (5.51d) 

rs  =  u.^  1  ( c)  u x  ^  CAf( 0,  erf  /2n)  (5 .5 1  e) 

re  =  Ui  WJ 1  (e)iu  ^  (5.51f ) 


Note  that  there  are  two  levels  of  approximation  in  te  stochastic  representat  ion  de¬ 
rived.  Firstly  that  T^1  could  be  written  in  the  simpler  form  as  in  (5.46)  ami  secondly 
t  hat  the  six  quadratic  forms  ?>;  can  be  treated  as  independent  random  variables. 

The  conditions  where  the  first  assumption  holds  needs  to  be  rigorously  investigated. 

We  can  also  use  the  approximation  (5.50)  to  derive  an  approximation  for  the  co- 
variance  matrix  E  (R  +  tfl)-1  .  Note  that  in  (5.50),  when  v(0)  =  U|  then  /  0  but 
j02  =  7  =  0  in  (5.49)  so  that  we  have 

E{u¥  (R+iSir^ij  =  E 


l/J  |2  (  rl  lrl|2  \] 
l/l1  Ui+n  d2  +  rj  \ 


The  quadratic  forms  u^(R  +  fi I)  1  u2  and  u^(R  +  di)'  'uj.  exhibit  a  similar  simple 
structure.  Correspondingly,  the  quadratic  form  uj^R  .  +  r>  I ) — 1  u_l  can  be  expressed  as 


uf(R  +  5I)-,u2  =  /3,/32* 


f  d. 

\1 

rU+n 

d-2  +  r.,  J  \ 

(5.52) 
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so  that  since  Ejra]  0.  we  liave 

£[uf(R  +  8 1)-J  u2]  0.  (5.53) 

This  suggests  that,  the  covariance  matrix  E  |(R  +  <5I)_1j  is  (approximately)  diagonal¬ 
ized  by  the  eigenvectors  of  R  so  that  it  can  be  approximated  as  in  (5.55).  We  can 
use  Taylor  series  expansions  to  obtain  the  diagonal  elements  of  ^  that  captures  the 
effect  of  the  diagonal  loading  value  8 .  and  the  ratio  r  =  n/rn.  Once  we  compute  this 
covariance  matrix,  it.  is  easy  to  compute  to  <5)]  since  we  have 


E$(0,8) }  =  v(0)hE  (R  +  rfl) 


-i 


v(«). 


(5.54) 


Approximation  5.72  (Two  sources  in  white  noise). 


[(R  4-  <5 1)-1 


U^U 


whe re  U  diagonalizes  R  and  is  a  diagonal  matrix  with 


(**){.,•  =  < 


|ft|»  M in - MM 

1  "  U  +r,  <h+u) 


_  _Nl_ 

r4  d\  +  t\ 


)] 


for  i  =  1 . 

/ or  i  —  2. 


•W-afe-aai  "“*■ 


(5.55) 


(5.56) 


,  n. 


■  5.8  Numerical  examples 

Consider  a.  scenario  involving  a  single  source  plus  interferer  and  a  set  of  signal  bearing 
snapshots  xt  ^  C-A/*[Ch  I„  +  cr|v(0'r)v(0r)  +  tfJv(0/)v(0/]7  for  n  =  1,2,...,  m  for  an  n  ~ 
18  element  uniform  linear  array  (ULA)  with  slightly  less  than  A/2  element  spacing.  The 
array  has  a  3  dB  beam  width  of  7*2  degrees  and  the  desired  target  signal  is  arbitrarily 
placed  at  dT  —  90  degrees  (array  broadside)  while  the  interferer  is  arbitrarily  placed  at 
0{  —  70  degrees. 

Figures  5-4  and  5-5  illustrate  the  success  of  the  predicted  (inverse)  of  the  diagonally 
Capon-MVDR  spectra!  estimator  *  i*e.,  the  denominator  of  (5.4)  and  the  results  obtained 
for  the  same  from  4000  Monte  Carlo  simulations  (red  circles)  for  snapshot  deficient  cases. 
Note  that  the  Capon-Good  man  result  cannot  be  used  here  since  they  require  m  >  v 
and  no  diagonal  loading*  The  denominator  of  the  Capon  estimator  in  (5.3)  constructed 
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using  the  true  covariance  matrix  is  plotted  for  reference. 

Consider  yet  another  such  scenario  involving  a  single  source  plus  interferer  and 
a  set.  of  signal  bearing  snapshots  x*  ~  C„V[0,/  +  0sv{8t)v(6t)  +  a] v(0/)v(0/],  for 
n  =  1,2, ...,m  for  an  n  =  18  element  uniform  linear  array  (ULA)  with  slightly  less 
than  A/2  element,  spacing.  The  array  has  a  3  dB  beamwidth  of  7.2  degrees  and  the 
desired  target  signal  is  arbitrarily  placed  at  Or  =  90  degrees  {array  broadside)  while 
the  interferer  is  arbitrarily  placed  at  0j  =  95  degrees  so  that,  the  source  ami  interferer 
are  closely  spaced. 

Figure  5-6  illustrates  the  success  of  the  predicted  (inverse)  of  the  diagonally  Capon- 
MVDR  spectral  estimator,  i.e.,  the  denominator  of  (5.4)  and  the  results  obtained  for 
the  same  from  4000  Monte  Carlo  simulations  (red  circles)  for  snapshot  deficient  cases. 
Note  how  as  the  source  and  interferer  power  reduces  from  20  dB  to  0  dB,  the  reso¬ 
lution  of  the  Capon-MVDR  beamformer  is  adversely  affected  as  observed  in  practice. 
The  denominator  of  the  Capon  estimator  in  (5.3)  constructed  using  the  true  covariance 
matrix  is  plotted  for  reference.  The  availability  and  great  accuracy  of  these  analytical 
predictions  in  the  snapshot  deficient  case  promises  to  have  a  major  impact  on  the  anal¬ 
ysis  of  the  Capon-MVDR  algorithm  beyond  the  threshold  SNR  where  its  performance 
is  known  to  degrade  dramatically. 

■  5.9  Future  work 

In  the  spirit  of  the  original  Capon-Goodman  result,  for  the  case  with  no  diagonal  load¬ 
ing,  we  were  able  to  use  the  knowledge  of  the  distribution  to  analytically  predict  the 
bcampattern  induced  by  diagonal  loading.  In  other  words,  we  approximated  the  dis¬ 
tribution  of  the  random  variable  /^ap<rt,(0,$)  as  a  function  of  0  for  a  given  value  of  S. 
The  predictions  were  shown  to  be  accurate  vis  a  vis  the  numerical  simulations.  The 
most  important  implication  of  this  for  practice  is  that,  this  understanding  can  help 
facilitate  the  analysis,  for  the  first,  time,  of  the  performance  of  applications  such  as 
direct ion-of-arrival  (DOA)  estimation  or  direction  finding  (DF)  that  use  a  diagonally 
loaded  Capon-MVDR  processor  in  the  snapshot  deficient  case.  Initial  results  in  this 
direction  may  be  found  in  [75). 
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(a)  n  =  18,  m  =  4,6  =  0.1,  it|  =  crj  =  100 


(b)  n  =  18,  m  =  4,  <5  =  10.  <r|  -  <r?  -  100 


Figure  5-4.  Two  equal  power  sources  at  90  degrees  and  70  degrees. 
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(a)  n  =  18, m  =  6.d  =  0.1,<t£  =  a2,  =  100 


(b)  7i  =  18.  J»i  =  6,  <1  =  10,  <r%  =  or?  -  100 


Figure  5-5.  Two  equal  power  sources  at  90  degrees  and  70  degrees. 
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(a)  n  =  l'8fm  =  4,6  —  10,  —  a]  =  100 


(b)  n  “  18t  m  =4,6—  10,  =  a  j  =  1 


Figure  5-6.  Two  Equal  Power  sources  at  90  degree  and  95  degrees. 


Chapter  6 


The  polynomial  method: 
Mathematical  foundation 


This  chapter  marks  the  start  of  the  second  part  of  the  dissertation.  Here,  we  lay  the 
foundation  for  a  powerful  method  that  allows  us  to  calculate  the  limiting  eigenvalue 
distribution  of  a  large  class  of  random  matrices.  We  see  this  method  as  allowing  us 
to  expand  our  reach  beyond  the  well  known  special  random  matrices  whose  limiting 
distributions  have  the  semi-circle  density  [113],  the  Mareenko-Pastur  density  [59],  the 
McKay  density  [61]  or  their  close  cousins  [22.82]. 

In  particular,  we  encode  transforms  of  the  limiting  eigenvalue  distribution  function 
as  solutions  of  a  bivariate  polynomial  equation.  Then  canonical  operations  on  the 
random  matrices  become  operations  on  the  bivariate  polynomials.  Before  delving  into 
a  description  of  a  class  of  random  matrices  for  which  t  his  characterization  applies,  we 
describe  the  various  ways  in  which  transforms  of  the  underlying  probability  distribution 
function  can  be  encoded  and  manipulated. 


■  6.1  Transform  representations 

■  6.1.1  The  Stieltjes  transform  and  some  minor  variations 

The  Stieltjes  transform  of  the  distribution  function  FA{x)  is  given  by 

mA{z)  =  J  ~TZdFA{x)  for  2  6  C+  \  R.  (6.1) 

The  Stieltjes  transform  may  be  interpreted  as  the  expectation 

mA{z)  =  Ex 

with  respect,  to  the  random  variable  x  with  distribution  function  FA{x).  Consequently, 
for  any  invertible  function  h(x)  continuous  over  the  support  of  dFA(x ).  the  Stieltjes 
transform  mA{z)  can  also  be  written  in  terms  of  the  distribution  of  the  random  variable 
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V  -  h{x)  as 


mA(z)  ~  Ex 


1 

—  F 

i 

x  —  c 

M-'Hy)  —  z_ 

(0.2) 


where  h1'  *)(•)  is  the  inverse  of  h(-)  with  respect  to  composition  i.e.  ^(ar))  =  .r, 

Equivalently,  for  y  —  h,(x),  we  obtain  the  relationship 


(6.3) 


1 

—  F 

1  ' 

3J  -  z 

- 1 

N 

1 

2 

The  well-known  Stieltjes-Perron  inversion  formula  [3] 


fA(x)  —  dFA{x )  —  -  lint  Irn  thaIx  +  if). 
7T  e—o+ 


(6.4) 


can  be  used  to  recover  the  probability  density  function  /^(.r)  from  the  Stieltjes  trans- 
for.  Here  and  for  the  remainder  of  this  thesis,  the  density  function  is  assumed  to  be 
distributional  derivative  of  the  distribution  function.  In  a  port  ion  of  tlx*  literature  on 
random  matrices,  the  Cauchy  transform  is  defined  as 

<m(*)  =  [  —dFA(x)  forz  €  C-‘  \  K. 

J  z  —  X 

The  Cauchy  transform  is  related  to  the  Stieltjes  transform,  as  defined  in  (6-1)-  by 


9a{z)  = 


(6-5) 


■  6.1.2  The  moment  transform 


When  the  probability  distribution  is  compactly  support ed,  the  Stieltjes  transform  can 
also  be  expressed  as  the  series  expansion 


mA(z) 


1 

z 


E 


j=l 


:J  +  ] 


((>.(>) 


about  z  =  oc,  where  M }x  j  xJdFA{x)  is  the  j-th  moment.  The  ordinary  moment 

generating  function,  is  the  power  series 


va{z)  =  ma  zj, 

j— ° 


(6.7) 
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with  Mq  =  1 .  The  moment  generating  function,  referred  to  as  the  moment  transform, 
is  related  to  the  Stieltjes  transform  by 


9a{z)  =  -~mA 


G) 


The  Stieltjes  transform  can  be  expressed  in  terms  of  the  moment  transform  as 


mA{z) 


-!„(!) 


(6.8) 


(6.9) 


The  eta  transform,  introduced  by  Tulino  and  Verdu  in  (104],  is  a  minor  variation  of  the 
moment  transform.  It  can  be  expressed  in  terms  of  the  Stieltjes  transform  as 


(6.10) 


Va{z)  =  \mA  i 

while  the  Stieltjes  transform  can  be  expressed  in  terms  of  the  eta  transform  as 

mA(z)  =  ~\r}A  (-7^  •  (6.11) 


■  6.1.3  The  R  transform 

The  R  transform  is  defined  in  terms  of  the  Cauchy  transform  as 

rA(z)=g^%)-\,  (6.12) 

where  0^  ^(2)  is  the  functional  inverse  of  gA{z)  with  respect  to  composition.  It  will 
often  be  more  convenient  to  use  the  expression  for  the  R  transform  in  terms  of  the 
Cauchy  transform  given  by 

rA(g)  =  z(g)  -  -•  (6.13) 

9 

The  R  transform  can  he  written  as  a  power  series  whose  coefficients  Kj  are  known  as 
the  “free  cumulants."  For  a  combinatorial  interpretation  of  free  cumulants,  six'  [92]. 
Thus  the  R  transform  is  the  (ordinary)  free  cumulant  generating  function 

OO 

rA (0)  -  X  Kf+ 


(6.14) 
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■  6.1.4  The  S  transform 

The  S  transform  is  relatively  more  complicated.  It  is  defined  as 

sA(z)  =  (6.15) 

* 

where  can  he  written  in  terms  of  the  Stieltjes  transform  as 

^a{z)  = --mA(\/z)-\.  (6.16) 

z 

This  definition  is  quite  cumbersome  to  work  with  because  of  the  functional  inverse  in 

(6.15) *  It  also  places  a  technical  restriction  (to  enable  series  inversion)  that  My  ^  (1 

We  can,  however,  avoid  this  by  expressing  the  S  transform  algebraically  in  terms  of  the 
Stieltjes  transform  as  shown  next.  We  first  plug  in  into  the  left-hand  side  of 

(6.15)  to  obtain 

T  a[z) 

This  can  be  rewritten  in  terms  of  m^{z)  using  the  relationship  in  (6.16)  to  obtain 


SA(--71l{l/z)  -  1)  = 


zm{\/z) 
m(lfz)  +  z 


or,  equivalently: 


sA{-zm{z)  -  1)  - 


m(z) 

Z  771  (  Z  )  1 


(6-17) 


We  now  define  y(z)  in  terms  of  the  Stieltjes  transform  as  y(z)  —  — zm(z )  —  1,  It  is 
clear  that  y(z)  is  an  invertible  function  of  m(z).  The  right  hand  side  of  (6,17)  can  be 
rewritten  in  terms  of  y/{z)  as 


sA(y(z))  =  - 


m(z) 

IKz) 


m(z) 

zm(z)  +  1 


(6.  IS) 


Equation  (6.18)  can  be  rewritten  to  obtain  a  simple  relationship  between  the  Stieltjes 
transform  and  the  S  transform 


rnA(z)  = -ysA(y).  (6.19) 

Noting  that  y  —  —  zm{z)  —  1  and  7/1(2)  —  —ysA(y)  we  obtain  the  relationship 

y  =  zysA{y)  -  1 


y  +  1 
y  $a{v)  ' 


or.  equivalently 


2  = 


(6.20) 
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■  6.2  The  Algebraic  Framework 

Notation  6.21  (Bivariate  polynomial).  Let  Luv  denote  a  bivariate  polynomial  of 
degree.  Du  in  u  and  Dv  in  v  defined  as 

Du  Dv  Du 

Kv  -  Luv(-> -)  =  YU1  cik  =  J2li (w)  •  (fi-21 ) 

j= 0  0  j= 0 

The  scalar  coefficients  Cj ^  are  real  valued . 


The  two  letter  subscripts  for  the  bivariate  polynomial  Luv  provide  us  with  a  con¬ 
vention  of  which  dummy  variables  we  will  use.  We  will  generically  use  the  first  let  ter  in 
the  subscript  to  represent  a  transform  of  the  density  with  the  second  letter  acting  as  a 
mnemonic  for  the  dummy  variable  associated  with  the  transform.  By  consistently  using 
the  same  pair  of  letters  to  denote  the  bivariate  polynomial  that  encodes  the  transform 
and  the  associated  dummy  variable,  this  abuse  of  notation  allows  us  to  readily  identify 
the  encoding  of  the  distribution  that  is  being  manipulated. 

Remark  6.22  (Irreducibility).  Unless  otherwise  stated  it  will  be  understood  that 
Luv(u,v)  is  “irreducible”  in  the  sense  that  the  conditions: 


•  li)(v), . . . ,  /pu (v)  have  no  common  factor  involving  v , 

•  Jd„(v)  ^  o. 

•  disci,  (v)  7^  0, 

are  satisfied,  where  disc /,(*>)  is  the  discriminant  of  Luv(u,v )  thought  of  as  a  polynomial 
in  v. 


We  are  particularly  focused  on  the  solution  "curves,"  tti(v), ... , udu(v),  i.e., 

LliV{u,v)  -  I.Du(v)  p[  (u  -  Ui{v))  . 

i=  1 

Informally  speaking,  when  we  refer  to  the  bivariate  polynomial  equation  Luv(u,  v)  =  0 
with  solutions  ufiv)  we  are  actually  considering  the  equivalence  class  of  rational  func¬ 
tions  with  this  set  of  solution  curves. 

Remark  6.23  (Equivalence  class).  The  equivalence  class  of  Luv(w,  v)  may  be  char¬ 
acterised  as  functions  of  the  form  Luv(u^v)g(v)/h(u,v)  where  h  is  relatively  prime  to 
Luv(u,v)  and  g(v)  is  not  identically  0. 
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A  few  technicalities  (such  as  poles  and  singular  points)  that  will  be  catalogued  later  in 
Chapter  8  remain,  but  this  is  sufficient  for  allowing  us  to  introduce  rational  transfor¬ 
mations  of  the  arguments  and  continue  to  use  the  language  of  polynomials. 

Definition  6-24  (Algebraic  distributions).  Let  F(x)  be  a  probability  distribution 
function  and  f(x)  be  its  distributional  derivative  (here  and  henceforth).  Consider  the 
Stieltjes  transform  m(z)  of  the  distribution  function,  defined  as 

m(z)  —  [  — - — -dF(x)  for  z  e  C+  \  R.  (6.22) 

J  x  —  z 

If  there  exists  a  bivariate  polynomial  Lmz  such  that  Lmz(rn(z),  z)  —  0  then  we  refer 
to  F(x)  as  algebraic  (probability)  distribution  function,  f(x)  as  an  algebraic  (probabil¬ 
ity)  density  function  and  say  the  f  €  Vai9 .  Here  Paig  denotes  the  class  of  algebraic 
(probability)  distributions. 

Definition  6.25  (Atomic  distribution).  Let  F{x)  be  a  probability  distribution  func¬ 
tion  of  the  form 

K 

F(x)  ~ 

f=i 

where  the  K  atoms  at  A j  6  R  have  (non- negative)  weights  pj  subject  to  pt  —  1  and 
D^  oc)  ls  indicator  (or  characteristic )  function  of  the  set  [a;,  oo).  We  refer  to  F(x)  as 
an  atomic  (probability)  distribution  function .  Denoting  its  distributional  derivative  by 
f{x).  we  say  that  f{x)  €  Patom-  Herr:  Patom  denotes  the  class  of  atomic  distributions. 

Example  6.26.  An  atomic  probability  distribution,  as  in  Definition  6 .25.  has  a  Stieltjes 
transform 

m{z) = 't  Vv 

i=l  1 

which  is  the  solution  of  the  equation  Lmz(tn ,  z)  —  t)  where 

K  K  K 

Lmz(rn,  z)  =  JJ(A *  -  z)  m  -  ^  PiC\?  “  z)- 

i—l  i“l 

i=J 


Hence  it  is  an  algebraic  distribution;  consequently  Patom  C  Paly* 
Example  6,27.  The  Cauchy  distribution  whose  density 


f(x)  = 


1 

jr(;r2  +1) 


has  a  Stieltjes  transform  m(z)  which  is  the  solution  of  the  equation  Lniz(m,z)  —  0  when 

Lm.(rn,  z)  =  (z2  +  1 )  m2  +  2  z  m  -h  1 . 
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Hence  it  is  an  algebraic  distribution* 

It,  is  often  the  case  that  the  probability  density  functions  of  algebraic  distributions, 
according  to  our  definition,  will  also  be  algebraic  functions  themselves.  We  conjecture 
that  this  is  a  necessary  but  not  sufficient  condition.  We  show  that  it  is  not  sufficient 
by  providing  the  counter-example  below. 

Counter-example  6,28.  Consider  the  quarter- circle  distribution  with  density  function 


\f A  — 

f(x)  = - -  for  X  6  [0, 2]. 

7T 


Its  Stieltjes  transform  ; 


is  clearly  not  an  algebraic  function.  Thus  f(x)  ^  Palg- 

We  now  define  six  interconnected  bivariate  polynomials  denoted  by  Lmz,  LRZ,  Lr^. 
Lsy,  L;i3!,  and  L^.  We  assume  that  is  an  irreducible  bivariate  polynomial  of 

the  form  in  (6.21).  The  main  protagonist,  of  tin1  transformations  we  consider  is  the 
bivariate  polynomial  Lmz  which  implicitly  defines  the  Stieltjes  transform  m(z)  via  the 
equation  Lma(m,  z)  —  0,  Starting  off  with  this  polynomial  we  can  obtain  the  polynomial 
Lkz  using  the  relationship  in  (6.5)  as 


=  Lm 


(6.23) 


Perhaps  we  should  explain  our  abuse  of  notation  once  again,  for  the  sake  of  clarity. 
Given  any  one  polynomial,  all  the  other  polynomials  can  be  obtained.  The  two  letter 
subscripts  not  only  tell  us  which  of  the  six  polynomials  we  are  focusing  on,  it  provides 
a  convention  of  which  dummy  variables  we  will  use.  The  first  letter  in  the  subscript 
represents  the  transform;  the  second  letter  is  a  mnemonic  For  the  variable  associated 
with  the  transform  that  we  use  consistently  in  the  software  based  on  this  framework. 
With  this  notation  in  mind,  we  can  obtain  the  polynomial  Lrg  from  using  (6.13)  as 


(6.24) 


Similarly,  we  can  obtain  the  bivariate  polynomial  Lsv  from  Lmz  using  the  expressions 
in  (6.19)  and  (6.20)  to  obtain  the  relationship 
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VI 


Legend 


m(z)  =  Stieltjes  transform 
g{z)  —  Cauchy  transform 
r(g)  =  R  transform 


s(y)  =  S  transform 


fi(z)  ~  Moment  transform 
(7(2)  =  Eta  transform 


L 


L 


Figure  6-1.  The  sis  interconnected  bivariate  polynomials;  transformations  between  the  polynomials, 
indicated  by  the  labelled  arrows,  are  given  in  Table  6.3. 


Based  on  the  transforms  discussed  in  Section  6.1,  we  can  derive  transformations  be¬ 
tween  additional  pairs  of  bivariate  polynomials  represented  by  the  bidirectional  arrows 
in  Figure  6-1  and  listed  in  the  third  column  of  Table  6.3.  Specifically,  the  expressions 
in  (6.8)  and  (6.11)  can  be  used  to  derive  the  transformations  between  Lmz  and  Lfr/  and 
Lmz  and  Lr}?  respectively.  The  Fourth  column  of  Table  6,3  lists  the  Matlab  function, 
implemented  using  its  Maple  based  Symbolic  Toolbox,  corresponding  to  the  bivariate 
polynomial  transformations  represented  in  Figure  6-1.  In  the  Matlab  functions,  the 
function  irreducLuv(u*v)  listed  in  Table  fi.2  ensures  that  ihe  resulting  bivariate  poly¬ 
nomial  is  irreducible  by  clearing  the  denominator  and  making  the  resulting  polynomial 
square  free. 

Example:  Consider  an  atomic  probability  distribution  with 


F(x)  =  0.5110.*,+  0.5 1^), 


(6.26) 


w  1 1  ose  S t iel t  j es  trail sf c >nn 


is  the  solution  of  the  equation 


m(0  -  z)(  1  -  z)  -  0-5(1  -  2 z)  =  0, 


or  equivalently,  the  solution  of  the  equation  Lmj,(rn,  z)  —  0  where 

Lmz(m,z)  ---  m(2  z2  -  2z)  -  (1  -  2  z). 


(0.27) 


We  can  obtain  the  bivariate  polynomial  Lgz{g,  z)  by  applying  the  transformation  in 
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Procedure 

Matlab  Code 

Simplify  and  clear  the  denominator 

Make  square  free 

Simplify 

function  Luv  =  irreducLuvCLuv ,u , v) 

L  =  numden (simplify (expand (Luv) )) ; 

L  =  Luv  /  maple (Jgcd* ,L,diff CL , u) > ; 

L  =  simplify (expand(L) ) ; 

L  *  Luv  /  maple ( *gcd’ ,L,diff {L,v)> ; 
Luv  =  simplify (expand(L) ) ; 

Table  6.1.  Making  Luv  irreducible. 


(6.23)  to  the  bivariate  polynomial  Lm7  given  by  (6.27)  so  that 

Ls M*)  =  -.9(2^  -  2*)  -  (1  -  2 z). 

Similarly,  by  applying  the  transformation  in  (6.24)  we  obtain 

«>  =  -» (2  (" +  j)  - 2  ('' + ;) )  -  ('  - 2 (r  + ;))  • 


(6.28) 


(6.29) 


which,  on  clearing  the  denominator  and  invoking  the  equivalence  class  representation 
of  our  polynomials  (sec  Remark  6.23),  gives  ns  the  irreducible  bivariate  polynomial 


Lrg(r,g)  =  -1  +  2  gr'2  +  (2  -  2  g)  r.  (6.30) 

By  applying  the  transformation  in  (6.25)  to  the  bivariate  polynomial  Lmst.  we  obtain 


-sy)  (2^-2  (^V)  -  (l-2—) 

\  sy  V  sy  /  J  \  sy  J 


LSy  =  (-1 


which  on  clearing  the  denominator  gives  us  t  he  irreducible  bivariate  polynomial 


L^(s,y)  =  (l+2y)s-2-2y. 


(6.31) 


Table  6.2  tabulates  the  six  bivariate  polynomial  encodings  in  Figure  6-1  for  the  distribu¬ 
tion  in  {(>.26),  the  semi-circle  distribution  for  Wigner  matrices  and  the  Marcenko-Pastur 
distribution  for  Wish  art  matrices. 
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L 

Bivariate  Polynomials 

^ITO! 

czm 1  —  ( 1  —  c  —  z)  m .+  1 

Lgi 5 

czg2  +  (1  -  c  -  z)g  +  I 

Ag 

(eg  -  l}r  +  1 

Ay 

(cy  +  l)s  —  1 

L/tz 

H2zc  —  (zc+l  —  z)  ft  +  1 

Aia 

rfzc  +  (—zc  +  1  —  z)  7}  —  1 

L 

Bi  variat  e  Polynom i  als 

^  mz 

m{2  z2  —  2z)  —  (1  —  2  z) 

-g(2zl-2z)  -  (1  -2 z) 

^rg 

-l  +  2gr2  +  (2-2g)r 

■^sy 

(l  +  2y)s-2-2y 

Lp  z 

(-2  +  2z)ft  +  2-z 

i 

(2z  +  2)n-2-z 

(a)  The  atomic  distribution  in  (f>,2(>)*  (b)  The  Marccnko-Pastur  distribution* 


L 

Bivariate  polynomials 

Lxm 

m 2  +  m  z  +  1 

Lgy 

g2-gz  + 1 

L  rg 

r  -  g 

^sy 

s2  y  —  1 

/1222  —  //  +  1 

^  T}Z 

£2?/2  “  T/  H-  1 

(c)  The  semi-dircle  distribution. 


Table  6.2.  Bivariate  polynomial  representations  of  some  algebraic  distributions. 
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Label 

Conversion 

Transformation 

MATLAB  Code 

I 

I'm*  =  Lg*(  -m.z) 

fundum  1.mz.  -  Lgz2l„m/(Lg/l 
syms  m  g  / 

Ltn/  * 

Ik*  =  Im*{-9<-) 

liirwlum  1^/  =  l.m/2l.g/ll.m/) 
syms 

Lg/  =  suhsil.m/.rrvg), 

II 

1*  =  !^ 

lunc  tmn  Lg*  =  1  .rg2 1 4/1 1  xg  1 
syms  t  g  / 

Lgfc  =  syhsil  jgj.y.n/gl; 
l.g/  =  HTudocl.uvlLgAg^). 

Ir*  =  I**l9'r+'-) 

lunt n inn  Ixg  =  Lg/2Jxg(l  p\ 
syms  1  g  t 

Ixg  =  sutaU-gAg.f+l/g); 

J  xg  *  mvtlud  .u*U  xgx.g  ); 

III 

/  ^  L 

I  mi  **  Ig*  **  It* 

lunclttun  Lm/.«  Ixg21  m/ll.rg) 
syms  m  /  r  g 

Lgz  - 1  xg21.g/l!xg): 
l.itii/  =  l.g/21  m/(t.g/|; 

luTKimn  1  rg  =  1  .tti/21  jg(l  .m/} 

syms  m  /  r  g 

l.g/  *  Lm/2l.g/U  mi0; 

Ixg  =  1.g/2!xg{Lg/l: 

IV 

^niz  ^sy 

i“  =  L*Lm  +  r  zra  11 

runcliim  1  .in/  =  1  ,sy2l  ,m/.(l.sy  i 
\ym\  m  /  s  y 

1  m/  =  suhsl  1  sy,s1inrt/*m+ 1 l); 
l  m/  =  suhsll  m/,y*  /*m-l ); 

I  n/  -  invJucl.uvd  nw.m./l. 

1  1  /  +  1, 

InTiL-lmn  1  sy  -  1  m/2l.yyil  m/l 

syms  m/sy 

l.sy  -  suMLm/*m.-y*sh 

1  .sy  =  suhsll  sy,z.ly+ 1  Vyh): 
l.sj  -  im*diKLLuv(Lsy.s.y). 

V 

^mz  ^ 

Imt  = 

function  1  ,m/  =  1  myu/21  m/|  1  myu/J 
syms  m  myu  /. 

I.mz  =  suMl.myu/,./,lM; 

Lm* *  substl.m/.myu^m'/K 

1  .m/  =  irrcitud  I  .ro/.m./); 

lunumn  t.myu/.  =  Lm/2t.myu/(l  .m/} 
syms  m  myu  /, 

1  myu/  -  suNLm/^l/z); 

Lmvuz  =  suhflfl,myiix*m,'inyu*/J; 

Lrnyu/  =  mvdud  uv<Lmyu/,myiu/); 

VI 

^t}Z 

Im*  = 

funciiim  Jxn/  -  lxia/21xn/(lxm/> 
syms  ttl  Cla  / 

Lm/  =  suhMlxWA/.-U/l. 
t  tii/  =  siihstl  m/.tiu./’mt. 

|  ,m/  =  irrcdutl  uvjl  .m/uTWh 

I„*=Im*(tV'-\) 

ruixtHHi  1  xlaz  =  I  m/21-cia/tixn/t 
syms  m  am  / 

1  aVAJ.  -  1/z); 

UlM  =  suhs(l4;iu/»m,/4ciuK 

1  tin/  =  trrcdwc  1  u  v|  1  tia/-CLi ./ 1 

Table  fi,3.  Transformations  between  the  different  bivariate  polynomials.  As  a  guide  to  MATLAB 
notation,  the  command  syms  declares  a  variable  to  be  symbolic  while  the  command  subs  symbolically 
substitutes  every  occurrence  of  the  second  argument  in  the  First  argument  with  t lit-  third  argument. 
Thus,  for  example,  the  command  y=subs(x“a,a,  10)  will  yield  the  output  y=x-10  if  we  have  previously 
declared  x  and  a  to  be  symbolic  using  the  command  syms  x  a. 


120 


CHAPTER  6  THE  POLYNOMIAL  METHOD  MATHEMATICAL  FOUNDATION 


■  6.3  Algebraic  manipulations  of  algebraic  functions 

Algebraic  functions  are  closed  under  addition  and  multiplication.  Hence  we  can  add 
(or  multiply)  two  algebraic  functions  and  obtain  another  algebraic  function.  We  show, 
using  purely  matrix  theoretic  arguments,  how  to  obtain  the  polynomial  equation  whose 
solution  is  the  sum  (or  product)  of  two  algebraic  functions  without  ever  actually  com¬ 
puting  the  individual  functions.  In  Section  6.4,  we  interpret  this  computation  using  the 
concept  of  resultants  [98]  from  elimination  theory.  These  tools  will  feature  prominently 
in  Chapter  7  when  we  encode  the  transformations  of  the  random  matrices  as  algebraic 
operations  on  the  appropriate  form  of  the  bivariate  polynomial  that  encodes  their  lim¬ 
iting  eigenvalue  distributions. 


Definition  6.31  (Companion  Matrix),  The  companion  matrix  to  a  moniv 

polynomial 

a(x)  =  Of)  +  O]  x  +  . . .  -h  On- 1  •X'T1  *  +  2-*” 
is  the  n  x  n  square  matrix 


6  .  • .  .  -ao 

1  * . .  -ai 


C 


aix)  ~ 


0 


-a-j 


0  .  I  -o7J_i 


with  ones  on  the  sub-diagonal  and  the  last  column  given  by  the  negative  coefficients  of 
u(x). 


Remark  6.32*  The  eigenvalues  of  the  companion  matrix  art  the  solutions  of  the  equa¬ 
tion  a(x)  —  0.  This  is  intimately  related  to  the  observation  that  the  characteristiv 
polynomial  of  the  companion  matrix  equals  o(:r),  i. e. , 

a(x)  -  det{xln  ~  Ca(xj). 

Consider  the  bivariate  polynomial  Luv  as  in  (6.21).  By  treating  il  as  a  polynomial  in  u 
whose  coefficients  fire  polynomials  in  v,  i.en  by  rewriting  it  as 

Du 

Lw{u,v)  =  lj{v)  uj.  (6.32) 

j=0 

we  can  create  a  companion  matrix  C}jv  whose  characteristic  polynomial  as  a  function 
of  u  is  the  bivariate  polynomial  Luv,  The  companion  matrix  CJ|V  is  the  Dn  x  Dn  matrix 
in  Table  6.4, 

Remark  6,33.  A  n a l og o us  t o  t h t  u n i  i y a ri a 1 1 i  cos e .  / //  c  ch am c t e ris tic  pc > ty n at ni a!  of  G[j 
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pu 

^uv 

Matlab  code 

'0  .  -/o(«)//d..(»)  * 

l  .  -f,<i*)//D» 

0  -h{v)/ln„{v) 

.0  .  1  -Id„-i(v)/Id.,{v)  _ 

function  Cu  —  Luv2Cu(Lttv,u) 

Du  =  doublet  mapieOdegree\Luv,u)k 
LDu  =  maple{TcoefT,,Luv,u)Du); 

On  —  sym(zeros(Du)}+  *, 

+d  iag(ones(  D1.1-  L 1 ) 1 )); 
for  Di  —  0:Du-] 

LtuDi  =  niaple(,coeff,ILl»u1Di); 
Cu{Di+LDu)  =  -LtuDi/LDu; 
end 

Table  6.4.  The  companion  matrix  C“v ,  with  respect  to  u.  of  I  he  bivariate  polynomial  f.llv  given  by 
(6.32). 


is  deX[ul  —  C]JV)  —  Luv(w,  u)//pl(  (v)Du  .  Since  Idu(v)  is  not  identically  zero ,  we  say  that 
det  (id  -  C}[v)  —  Luv(ti,  v)  where  the  equality  is  understood  to  he  with  respect  to  the 
equivalence  class  of  Luv  as  in  Remark  6. 23.  The  eigenvalues  of  C]]v  are  the  solutions  of 
the  algebraic  equation  Luv(u,  v)  —  0;  specifically,  we  obtain  the  algebraic  function  u(v). 


Definition  6.34  (Kronecker  product)-  //  Am  (with  entries  aij)  is  an  rn  x  m  matrix 
and  Bri  is  an  n  x  n  matrix  then  the  Kronecker  (or  tensor)  product  of  Am  and  Bn? 
denoted  by  Am  ®  B„,  is  the  run  x  run  matrix  defined  as: 


A  m  0  - 


am  i  Bn 


a  i  if  B  ii 


Lemma  6.35,  If  n,  and  0j  are  the  eigenvalues  of  Am  and  Bn  respectively,  then 
L  at  +  fij  is  an  eigenvalue  of  (Am  ®  I„)  +  (lm  ®  B„}, 

2 .  cijfij  is  an  eigenvalue  of  Am@Bn, 


for  i  =  1,  j  - 


Proof.  This  is  a  standard  result  in  linear  algebra  that  may  be  found  in  several  standard 
texts  including  [48],  0 

Proposition  6,36,  Let  uj(v)  be  a  solution  of  the  algebraic  equation  Lliy(u,v)  —  0, 
or  equivalently  an  eigenvalue  of  the  D\  x  companion  matrix  C}|^r .  Let  U2{v)  be  a 
solution  of  the  algebraic  equation  LfiV(u,v)  —  0,  or  equivalently  an  eigenvalue  of  the 
Dl  x  Dl  companion  matrix  C"$.  Then 

1.  1*3(11)  =  «j(v)  +1*2(1')  is  an  eigenvalue  of  the  matrix  CJ[J  =  ®  I//j)  + 

(«DJ®C“), 

2.  U3 (1/)  —  u\ {v)u2{v)  is  an  eigenvalue  of  the  matrix  —  CJj^  ®  C^. 
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Equivalently  1x3(11)  is  a  solution  of  the  algebraic  equation  LfiV  —  0  when  LjJv  —  det(ul  — 

c»). 

Proof.  'I'his  follows  directly  from  Lemma  6.35.  0 

We  represent  the  binary  addition  and  multiplication  operators  011  the  space  of  al¬ 
gebraic  functions  by  the  symbols  EBU  and  0u  respectively.  We  define  addition  and 
multiplication  as  in  Table  6.5  by  applying  Proposition  6.36.  Note  that  the  subscript 
hi’  in  EBU  and  &u  provides  us  with  an  indispensable  convention  of  which  dummy  vari¬ 
able  we  are  using.  Table  6.6  illustrates  the  ffi  and  E  operations  on  a  pair  of  bivariate 
polynomials  and  underscores  the  importance  of  the  symbolic  software  developed.  The 
(D„ +1)  x  (Dv+1)  matrix  Tllv  lists  only  the  coefficients  c,j  for  the  term  id  in  the 
polynomial  Lm(u,v).  Note  that  the  indexing  for  i  and  j  starts  with  zero. 


Operation:  < — »  L^v 

Mati.au  Code 

h:L  =  i 
csj=| 

iv  ffiu  Eiv  =  det(u  I  -  c;;j).  where 
f2C"'  ifLiv-L2v, 

[  ( <S>  I oi )  +  (Id«  ®  c;;j)  Otherwise. 

function  Luv3  —  Llp)usL2{Luvl,Liiv2,u) 

C'ul  =  Ltiv2Cu(Litvl,u): 
if  (Luvl  ==  Luv2) 

Cu3  =  2+CuI; 
els*- 

Cu2  =  Luv2Cu(Luv2,u): 

Gu3  =  kron  ( Cn  1  ^eye  ( length  ( Ou2)  > )  +  .. 
+kron(eye(  lengih(Cu  1  )),(.'ii2); 

end 

Lnv'i  =  del  (u#rvf,|3*ngth((,n:i))-f'u'ij: 

IF  II 

c?  a  f  =* 

u 

iv  ®11  ^uv  =  det(u  I  -  where 

[C“J  =  (CUP2  irLi,  =  L^, 

[  c;:c  =  cut  ®  C“t  Otherwise. 

function  Ltiv3  =  LlthnesL2(Luvl,Luv2,n) 

Cut  =  Luv2Ca(LuvLu)h 
if  (Luvl  =  Ltiv2) 

Cu3  =  Cu2; 
el&e 

Cu2  —  Luv2Cu{Luv2Tu): 

Cu3  =  krtm(Cu  LCu2): 
end 

Luv3  =  d*t(iji*eye(leiiKth(Cu3))-C'u3); 

Table  ft. 5,  Formal  and  computational  description  of  the  EBU  and  0,,  operators  acting  on  the  bivariate 
polynomials  Liv(t/Tu)  and  L„v(uti/)  where  C|||  and  are  their  corresponding  companion  matrices 
constructed  as  in  Table  (5.4  and  ®  is  the  matrix  K roiiecker  product  . 


■  6.4  Algebraic  manipulations  using  the  resultant 

Addition  {and  multiplication)  of  algebraic:  functions  produces  another  algebraic'  func¬ 
tion.  We  now  demonstrate  how  the  concept  of  resultants  from  elimination  t  heory  can 
be  used  to  obtain  the  polynomial  whose  zero  set  is  the  required  algebraic  function. 


Definition  6.41  (Resultant).  Given  a  polynomial 


a{:r)  =  ao  +  a\  x  +  • , .  +  an-\  1  +  On^n 
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^UV 

T„v 

'  !IV 

Pv 

^uv 

£,l,v  =  u2 v  +  u  (1  -  v)  +  v2 

1 

U 

V 

1  v  V* 

1  ’ 

1  -1  - 

1  ‘  

'  0  -» 
t  -I  +  v 

L  v 

0  -u 

I  -l/2  +  u 

L2V  =  u2  (v2  -  3  v  +  1 )  +  u  ( l  +  v)  +  v2 

l 

u 

U3I 

~  ~ ?" 

1  1 

1  1 

.  1  -3  1  _ 

o  1 

r0 

ti*-3ti  +  1 

-1-v 

L  «a*3v-+]  J 

w* + 1 

3u*  —  u 

L  k*  +  i  J 

Ty 

~uv. 

ffly 

Lr 

_ 

r **_ 

V  ^HV 

l 

V 

Vs  ,.4 

] 

V 

u2 

U*  K4 

l 

1 

i 

•  1  ’ 

a 

* 

■  • 

u 

■ 

4 

«* 

- 

s 

-2 

1  * 

t*2 

1 

-4 

us 

—4  - 

u3 

“8 

fi 

u4 

1 

\ 

-9 

3  - 

i/4 

1 

-2 

3 

u& 

2 

-3 

7 

u* 

8  - 

12 

U* 

3 

u* 

3 

2 

«T 

4 

-I 

•  * 

ur 

2 

* 

«' 

3 

-1 

1 

V* 

-1 

u* 

2 

3 

u’0 

.  I 

Table  6.6,  Examples  of  ffl  and  H  operations  on  a  pair  of  bivariate  polynomials,  L*lv  and  L2 
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of  degree  n  with  wots  m,  for  i  =  1 and  a  polynomial 

h(x)  =  6o  +  6i  x  +  . . .  +  hm-i  a;’"-1  +  bmxm 
of  degree  m  with,  mots  0j,  for  j  =  1 . m,  the  resultant  is  defined  as 

n  m 

R<*x («(*) .  6(1))  =  <£ K,  Illllft-",) 

«=lj=] 

From  a  computational  standpoint,  the  resultant  can  be  directly  computed  from  the 
coefficients  of  the  polynomials  itself.  The  computation  involves  the  formation  of  the 
Sylvester  matrix  and  exploiting  an  identity  that  relates  the  determinant  of  the  Sylvester 
matrix  to  the  resultant. 

Definition  6.42  (Sylvester  matrix).  Given  polynomials  a{x)  and  h(.r)  with  degree 
n  and  m  respectively  and  coefficients  as  in  Definition  6 '.4  T  the.  Sylvester  matrix  is  the 
(n  +  m)  x  {n  +  m)  matrix 


S(a,b)  = 


'  On 

0  ••• 

0 

0 

K, 

0  ... 

0 

0 ' 

(In- 1 

an  •  • . 

0 

0 

bm—l 

bin  *  * ' 

0 

0 

0 

(}  ... 

«0 

«l 

0 

0  ••• 

bo 

hi 

0 

0  ••• 

0 

0() 

0 

0  •  ■  • 

0 

ho. 

Proposition  6.43.  The  resultant  of  two  polynomials  o(x)  and  h(x)  is  related  to  tin 
determinant  of  the  Sylvester  matrix  by 

det(S(a,fe))  -  Resx(a(x),  b(x)) 

Proof.  This  identity  can  he  proved  using  standard  linear  algebra  arguments*  A  proof 
may  be  found  in  [4].  □ 

For  our  purpose,  the  utility  of  this  definition  is  that  the  E0U  and  Ku  operations  ran 
be  .expressed  in  terms  of  resultants*  Suppose  we  are  given  two  bivariate  polynomials  L]uy 
and  Lfiv,  By  using  the  definition  of  the  resultant  and  treating  the  bivariate  polynomials 
as  polynomials  in  u  whose  coefficients  are  polynomials  in  v ,  we  obtain  the  identities 


L\\v(t,  v)  =  L,'IV  EB„  L;IV  =  Res,,  (Llilv{t  -  u,v) ,  Z-“v(u,  i?))  , 


and 


=  LL  h,  l'L  =  R«su  -^uVKv))  i 


(6.33) 


(6.34) 


where  D\  is  the  degree  of  L]nv  with  respect  to  u.  By  Proposition  6*43,  evaluating  the  ffl,, 
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and  E3n  operations  via  the  resultant  formulation  involves  computing  the  determinant 
of  the  {D]u  +  D„)  x  (D,1!  +  D„)  Sylvester  matrix.  When  L*v  ^  L„v,  this  results  in 
a  steep  computational  saving  relative  to  the  companion  matrix  based  formulation  in 
Table  6.5  which  involves  computing  the  determinant  of  a  (£)„£>„)  x  (D\D*)  matrix. 
Fast  algorithms  for  computing  the  resultant  exploit  this  and  other  properties  of  the 
Sylvester  matrix  formulation.  In  Maple  ,  the  computation  £flv  =  Lj,v  EB„  Lflv  may  be 
performed  using  the  command: 

Luv3  =  subs (t=u, resultant (subs (u=t-u,Luvl) ,Luv2,u)) ; 

The  computation  Lflv  —  L'IV  T„v  can  be  performed  via  the  sequence  of  commands: 

Dul  =  degree (Luv 1 ,u) ; 

Luv3  =  subs(t=u,resultant(simplify(u'Dul*subs(u=t/u,Luvl) ) ,Luv2,u)); 

When  Llw  —  L*w,  however,  the  EB„  and  operations  are  best  performed  using 
the  companion  matrix  formulation  in  Table  6.5.  The  software  implementation  of  the 
operations  in  Table  6.5  in  [73]  uses  the  companion  matrix  formulation  when  L,’1V  —  Lfn, 
and  the  resultant  formulation  otherwise. 

In  this  chapter  we  established  our  ability  to  encode  algebraic  distribution  as  solutions 
of  bivariate  polynomial  equations  and  to  manipulate  the  solutions.  This  sets  the  stage 
for  tiefining  the  class  of  "algebraic”  random  matrices  next . 
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Chapter  7 


The  polynomial  method: 
Algebraic  random  matrices 


■  7.1  Motivation 

A  random  matrix  is  a  matrix  whose  elements  are  random  variables.  Let  A  ,y  be  an 
N  x  N  symmetric/Hermitian  random  matrix.  Its  empirical  distribution  function  (e.d.f.) 
is  given  bv 

1  N 

fA"W  =  T v  &*..«»•  <7'» 

i=] 

where  A]. . . * .  Ajv  are  the  eigenvalues  of  A,v  (counted  with  multiplicity)  and  1[at,oc)  “  1 
when  x  >  Af  and  zero  otherwise.  For  a  large  class  of  random  matrices,  the  empirical 
distribution  function  FAjV(x)  converges,  for  every  x,  almost  surely  (or  in  probability) 
as  N  —>  oo  to  a  non-random  distribution  function  FA (x).  The  associated  eigenvalue 
density  function,  denoted  by  /^(x),  is  its  distributional  derivative. 

We  are  interested  in  identifying  canonical  random  matrix  operations  for  which  the 
limiting  eigenvalue  distribution  of  the  resulting  matrix  is  an  algebraic  distribution.  This 
is  equivalent  to  identifying  operations  for  which  the  transformations  in  the  random 
matrices  can  be  mapped  into  transformations  of  the  bivariate  polynomial  that  encodes 
the  limiting  eigenvalue  distribution  function.  This  motivates  the  construction  of  the 
class  of  “algebraic”  random  matrices  which  we  shall  define  next.. 

The  practical  utility  of  this  definition,  which  will  become  apparent  in  Chapters  S 
and  9  can  be  succinctly  summarized:  if  a  random  matrix  is  shown  to  be  algebraic  then 
its  limiting  eigenvalue  density  function  can  be  computed  using  a  simple  root-finding 
algorithm.  Furthermore,  if  the  moments  exist,  they  will  satisfy  a  finite  depth  linear 
recursion  (see  Theorem  8*36)  with  polynomial  coefficients  so  that  we  will  often  be  able 
to  enumerate  them  efficiently  in  dosed  form.  Algebraicity  of  a  random  matrix  thus  acts 
as  a  certificate  of  the  computability  of  its  limiting  eigenvalue  density  function  and  the 
associated  moments.  In  this  chapter  our  objective  is  to  specify  the  class  of  algebraic 
random  matrices  by  its  generators. 


127 


128 


CHAPTER  7,  THE  POLYNOMIAL  METHOD:  ALGEBRAIC  RANDOM  MATRICES 


■  7.2  Definitions 

Let  A for  N  —  1,2,...  be  a  sequence  of  N  x  N  random  matrices  with  real  eigenvalues. 
Let  FAn  denote  the  e.d,f.,  as  in  (7.1).  Suppose  FAn(x)  converges  almost  surely  (or  in 
probability),  for  every  xr  to  FA(x)  as  A7  — *  oc,  then  we  say  that.  Ay  j— *  A .  We  denote 
the  associated  (non-random)  limiting  probability  density  function  by  /a{- 3'). 

Notation  7.21  (Mode  of  convergence  of  the  empirical  distribution  function). 

When  necessary  we  highlight  the  inode  of  convergence  of  the  underlying  distribution 
function  thus :  if  A/v  ^  A  then  it  is  shorthand  for  the  statement  that  the  empir¬ 
ical  distribution  junction  of  An  converges  almost  surely  to  the  distribution  function 
Fa;  likewise  A/v  A  is  shorthand  for  the  statement  that  the  empirical  distribution 
function  of  Ay  converges  in  probability  to  the  distribution  function  FA .  When  the 
distinction  is  not  made  then  almost  sure  convergence  is  assumed . 

Remark  7.22.  The  element  A  above  is  not  to  be  interpreted  as  a  matrix.  There  is  no 
convergence  in  the  sense  of  an  ooxoo  matrix .  The  notation  Ay  A  is  shorthand 
for  describing  the  convergence  of  the  associated  distribution  functions  and  not  of  the 
matrix  itself.  We  think  of  A  as  being  an  (abstract)  element  of  a  probability  space  with 
distribution  function  FA  and  associated  density  function  f  a  * 

Definition  7.23  (Atomic  random  matrix).  //  f a  €  Vatom  then  wc  say  that  A/v 
is  an  atomic  random  matrix.  We  represent  this  as  Ay  >— >  A  £  AAatmn  where  AA atom 
denotes  the  class  of  atomic  random  matrices. 

Definition  7.24  (Algebraic  random  matrix).  If  (a  £  then  me  say  that  Ay  is 

an  algebraically  chafacterizable  random  matrix  (often  suppressing  the  word  chamcteriz- 
able  for  brevity).  We  represent  this  as  Ay  t — *  A  6  A4aUt  where  AA a{9  denotes  the  class 
of  algebraic  random  matrices ■  Note  that ?  by  definition*  AAaunn  C  AA  . 

The  ability  to  describe  the  class  of  algebraic  random  matrices  and  t  he  technic  pie 
needed  to  compute  the  associated  bivariate  polynomial  is  at  the  crux  our  investigation. 
In  the  theorems  that  follow,  we  accomplish  the  former  by  cataloguing  random  matrix 
operations  that  preserve  algebraieity  of  the  limiting  distribution.  The  Following  prop¬ 
erly  of  t  he  convergence  of  distributions  will  prove  useful. 


Proposition  7,25  (Continuous  mapping  theorem).  Let  Ay  y— »  A .  Let  /a  and 

S'\  denote  the  corresponding  limiting  density  function  and  the  atomic  component  of  the 
support,  respectively*  Consider  the  mapping  y  =  h(x)  continuous  everywhere  on  the 
real  line  except  on  the  set  of  its  discontinuities  denoted  by  'Dy.  If  Du  O  S*\  —  0  then 
By  —  h  ( A  j\F )  1 — *  B.  The  associated  non-mndom  distribution  function,  Ffi  is  given  by 
Fh(y)  —  Fa  (h^~l^(y))  .  The  associated  probability  density  function  is  its  distributional 
derivative . 

Proof,  This  is  a  restatement  of  continuous  mapping  theorem  which  follows  from  well- 
known  facts  about  the  convergence  of  distributions  [17].  □ 
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■  7.3  Deterministic  operations  on  algebraic  random  matrices 

We  first  consider  some  simple  deterministic  transformations  on  an  algebraic  random 
matrix  A  n  that  produce  an  algebraic  random  matrix  B,\-. 

Theorem  7.31.  Let  An  i ->  A  e  Maig  andp,  q.  r,  and  s  be  real-valued  scalars.  Then, 
Bn  ~  (j) A,v  +  ql n)/{tAn  +  s In)  i ->  /?  €  Malg, 
provided  Ja  does  not.  contain  an  atom  at  —s/r  and  r,  s  are  not  zero  simultaneously. 


PROOF.  Here  we  have  h(x)  ■-  (px  +  r)/{qx  +  s)  which  is  continuous  everywhere  except 
at  x  =  —s/r  for  s  and  r  not  simultaneously  zero.  From  Proposition  7.25,  unless  /a(x) 
has  jtn  atomic  component  at  —s/r,  B,v  •— *  B.  The  Stieltjes  transform  of  F1*  can  be 
expressed  as 

"•»w = sJ-M  =  ^  J-  <7-2» 


dFA(x).  (7.3) 


Equation  (7.2)  can  be  rewritten  sis 

TX  +  $ 


mB<2)  =  /  frT 


rx)x  +  {<!  -  sz) 


j  If  rx  -H  s 

dFA{x)  = -  / - — 

p-rzj  x+2-2| 


With  some  algebraic  manipulations,  we  can  rewrite  (7.3)  as 


m 


«(2)  -  «■  I  =  A  (r/s£^w  +  sItT 


dFA(x) 


) 


(7.4) 


=  fit  \  r  /  dFA  (x)  -  raz 


where  fiz  —  \/{p-rz )  and  =  (q  -  s z)/{p-  r  z).  Using  the  definition  of  the  Stieltjes 
transform  and  the  identity  J  dFA(x)  =  1,  we  can  express  mg (2)  in  (7.4)  in  terms  of 
771,4  (z)  as 

mg(z)  =  (izr +  (fizS  -  Pra,)mA(-as).  (7.5) 

Equation  (7.5)  can,  equivalently,  be  rewritten  as 


”m(-o2) 


"ib (a)  -  (hr 

0Z$  ~  &  t  a . ' 


(7.6) 


Equation  (7.6)  can  be  expressed  as  an  operational  law  on  LAn  as 


Lrn 7.(m'  z)  =  LnvAi™  -  fit  r)/{fit  S  -  fiz  roj.-tif,). 


(7.7) 
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Since  exists,  we  can  obtain  L^z  by  applying  the  transformation  in  (7.7),  and 
clearing  the  denominator  to  obtain  the  irreducible  bivariate  polynomial  consistent  with 
Remark  6.23.  Since  L^l7  exists,  this  proves  that  //?  £  7^  and  By  *-*  B  €  □ 

Appropriate  substitutions  for  the  scalars  p,  q ,  r  and  s  in  Theorem  7.31  leads  to  the 
following  Corollary. 


Corollary  7.32.  Let  Ay  •— *  A  €  Maig  and  let  a  be  a  real-valued  scalar.  Then. 

1.  By  —  A^1  k-*  D  £  Matr  provided  /a  does  not  contain  at  atom  at  0, 

2 .  Rtf  —  o  A  y  1 — ^  B  E 

5.  By  =  A  y  +  o  I.v  B  E 


Theorem  7.33.  Le*  Xnhy  an.  n  x  N  maim.  //Ay  —  XnjyX.^y  ^  then 

BjY  =  Xf!  NX„jv  ►-+  i3  6  . 


Proof.  Here  X,,  y  is  an  n  x  N  matrix,  so  that  An  and  By  are  //  x  7/  and  N  x  N  sized 
matrices  respectively.  Let  cy  —  n/Ar.  When  cy  <  1,  By  will  have  N  —  n  eigenvalues 
of  magnitude  zero  while  the  remaining  n  eigenvalues  will  be  identically  equal  to  the 
eigenvalues  of  A„.  Thus,  the  e.d.f.  of  B  y  is  related  to  the  e.d.f.  of  A„  as 


=  (1  “  « 1V.CC)  +  CJV  FA"  (x). 


N  -  n 


(7.8) 


where  H[oiCC)  is  the  indicator  function  that  is  equal  to  1  when  :r  >  0  and  is  equal  to  zero 
otherwise. 

Similarly,  when  r:.y  >  1,  An  will  have  n  —  N  eigenvalues  of  magnitude  zero  while 
the  remaining  N  eigenvalues  will  be  identically  equal  to  the  eigenvalues  of  By.  Thus 
the  e.d.f.  of  Ar,  is  related  to  the  e.d.f.  of  By  as 


FAn  (x) 


(7.9) 


Equation  (7,9)  is  (7.8)  rearranged;  so  we  do  not  need  to  differentiate  between  the  case 
when  Ctf  <  1  and  Ctf  >  1. 

Thus,  as  n,  Ar  — ►  oo  with  Ctf  =  n/N  — *  c,  if  FAtJ  converges  to  a  non-random  d.f. 
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Fa,  then  /rB,v  will  also  converge  to  a  non-random  d.f.  FIS  related  to  FA  by 


FB(x)  =  (1  —  c)Ij0iOOj  +  cFa(x). 


(7.10) 


From  (7.1U),  it  is  evident  that  the  Stieltjes  transform  of  the  limiting  distribution  func¬ 
tions  Fa  and  FB  are  related  as 


mA{z)  =  - 


c 


(7.11) 


Rearranging  the  terms  on  either  side  of  (7.11)  allows  us  to  express  mfi(z)  in  terms  of 
77*4(2)  as 

rnB{z)  =  -  +  cmA{z ).  (7.12) 

z 

Equation  (7.12)  can  be  expressed  as  an  operational  law  on  LA7  as 


(7.13) 


Given  LAU,  we  can  obtain  L**z  by  using  (7.13).  Hence  B;v  •— >  B  e  M aig .  0 

Theorem  7.34.  Let  A  €  Then 

B.v  =  (A,v)2  *-►  B  €  Malg- 

Proof,  Here  we  have  h(x)  =  x2  which  is  continuous  everywhere.  From  Proposition 
7.25,  B  y  >— »  B.  The  Stieltjes  transform  of  Fli  can  be  expressed  as 


mo(z)  =  Ey 


(7.14) 


Equation  (7.14)  can  be  rewritten  as 

"**« = 2^  /  irhrT^  -  W*  / 

Equation  (7.15)  leads  to  the  operational  law 


Lt„2(m,2)  =  LAlz{2m \fz,  \Zi)  fflm  LAz{~2niy/z,  sfz). 


(7.15) 

(7.16) 


(7.17) 


Given  LAW  we  can  obtain  by  using  (7.17).  This  proves  that  B,v 


B  6  jMaig.  D 
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Theorem  7.35.  Let  A n  ^  A  €  Mut<,  and  B,y  >—>  B  €  Mai,r  Then. 

Cm  -  diag(An,  By) C  €  Matg, 
where  M  —  n  +  N  and  n/N  — »  c  >  0  as  n,  N  —>  oo. 


Proof.  Let  C.y  be  a n  N  x  N  block  diagonal  matrix  formed  from  the  n  x  n  matrix  A„ 
and  the  M  x  M  matrix  B.*/.  Let  cat  =  n/N.  The  e.d.f.  of  Cft  is  given  by 

Fc"  =  C;vFA»+(l-cw)FB»'. 

Let  n,N  -**  oo  and  =  n/N  —>  c.  If  FAn  and  FBhi  converge  in  distribution  almost 
surely  (or  in  probability)  to  non- random  d.f.’s  FA  and  FB  respectively,  then  FCn 
will  also  converge  in  distribution  almost  surely  (or  in  probability)  to  a  non-random 
distribution  function  F(  given  by 

Fc(x)  =  cFA(ir)  +  (1  -  c)  Fb{x).  (7.18) 

The  Stieltjes  transform  of  the  distribution  function  Fc  can  hence  be  written  in  terms 
of  the  Stieltjes  transforms  of  the  distribution  functions  Fa  and  FB  as 


mc{z)  =  cmA{z)  +  (1  -  c)  mB{z) 


(7*19) 


Equation  (7.19)  can  be  expressed  as  an  operational  law  on  the  bivariate  polynomial 
LmAm^Z) 


(7  20) 


Given  LA7  and  LBlz ,  and  the  definition  of  the  EBm  operator  in  Section  6.3,  L{^2  is  a 
polynomial  which  can  be  constructed  explicitly.  This  proves  that  Cjv  *—*  C  €  ,Maig.  0 


Theorem  7.36.  If  A  H  =  diag  (B  y ,  o- 1  fl  _  )  nnd  a  is  a  real  valued  scalar.  Then . 

BjV  B  €  Mai;r 

as  n,  N  — *  oc  with  cyv  —  n/N  — ►  c. 


PROOF.  Assume  that  as  rc,  A;r  — *  oo,  Cj\r  =  n/7V  — *  c.  As  we  did  in  the  proof  of  Theorem 
7.35,  we  can  show  that  the  Stieltjes  transform  /n^(^)  can  be  expressed  in  terms  of  m/jji) 


as 


1 


+  -niB{z). 
c 


7UA{Z)  = 


a  —  z 


(7.21) 


7  4  GAUSSIAN-LIKE  MATRIX  OPERATIONS  ON  ALGEBRAIC  RANDOM  MATRICES 


133 


This  allows  us  to  express  L^n(m,z)  in  terms  of  L^z(m,  z)  using  the  relationship  in 


(7.21)  as 


(7.22) 


We  can  hence  obtain  L®z  from  L^yi  using  (7.22).  This  proves  that  B/v  *-»  B  €  ^Ma|g.  Q 


Corollary  7.37.  Let  A.v  *-»  A  €  Maig-  Then 


B a-  -  tliag(A„.aIjv-n)  •-*  B  G  Matg, 


for  n/N  — »  c  >  0  as  n,  N  — »  00. 

PROOF.  This  follows  directly  from  Theorem  7.35.  0 

■  7.4  Gaussian-like  matrix  operations  on  algebraic  random  ma¬ 


trices 


We  now  consider  some  simple  stochastic  transformations  that  “blur”  the  eigenvalues  of 
A,v  by  injecting  additional  randomness.  We  show  that  canonical  operations  involving 
an  algebraic  random  matrix  A,v  and  Gaussian-like  and  Wisliart-like  random  matrices 
(defined  next)  produce  an  algebraic  random  matrix  Bar. 

Definition  7.41  (Gaussian-like  random  matrix).  Let  Y,v./.  be  an  N  x  L  matrix 
with  independent.,  identically  distributed  (i.i.d.)  elements  having  zero  mean,  unit  vari¬ 
ance  and  bounded  higher  order  moments.  Wc  label  the  matrix  G;v./,  -  ^Yjv.z,  as  a 
Gaussian-like  random  matrix. 

We  can  sample  a  Gaussian-like  random  matrix  in  Mat  I,  A 15  as 
G  =  sign(randn(N ,L) )/sqrt(L) ; 

Gaussian-like  matrices  are  labelled  thus  because  they  exhibit  the  same  limiting  behavior 
in  the  N  —>  00  limit  as  “pure”  Gaussian  matrices  which  may  be  sampled  in  Matlab 
as 

G  =  randn(N ,L)/sqrt(L) ; 

Definition  7.42  (Wishart-like  random  matrix).  Let  G.v./,  be  a  Gaussian-like  ran¬ 
dom  matrix.  Wc  label  the  matrix  Wpj  =  G.v.z,  x  G'v  L  as  a  Wishart-like  random  matrix. 
Let  ryv  =  N/L.  We  denote  a  Wishart-like  random  matrix  thus  fanned  by  W at(<w). 
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Remark  7,43  ( Algebraicity  of  Wishart-like  random  matrices).  The  limiting 
eigenvalue  distribution  of  the  Wishart-like  random  matrix  has  the  Marcenko-Pastur 
density  which  is  an  algebraic  density  since  L]}IZ  exists  (see  Table  6.2(h)). 


Proposition  7,44.  Assume  that  G n,l  an  N  x  L  Gaussian- like  random  matrix.  Let 
Ay  A  be  an  N  x  N  symmetric /Hermitian  random  matrix  and  T/.  7’  be  an  Lx  L 

diagonal  atomic  random  matrix  respectively .  If  Gyjjt  A  y  and  T  ^  air:  mdependent 
then  By  =  Ay  +  G N  LT [  G y^  B t  as  cl  —  N/L  — *  c  for  N*L  —>  oo?.  The  Stieltjes 
transform  m s{z)  of  the  unique  distribution  function  FB  is  satisfies  the  equation 


tiib(z)  -  mA 


xdFr{x)  \ 
1  +  x  mB{z) ) 


(7 .2:1) 


Proof,  This  result  may  be  found  in  Mareenko-Pastur  [59]  and  Silverstein  [S(>].  □ 


We  can  reformulate  Proposition  7.44  to  obtain  the  following  result  on  algebraic  random 
matrices. 


Theorem  7.45,  Let  Ay,  G y^  and  T be  defined  as  in  Proposition  7.44 -  Then 
By  —  Ay  4-  Gl  yT I  G^y  B  6  Maigi 

as  cl  =  N/L  —>  c  for  N,  L  —>  oo. 


Proof,  Let  be  an  atomic  matrix  with  d  atomic  masses  of  weight  p*  and  magnitude' 
\t  for  i  =  1,2,  From  Proposition  7,44,  7113(2)  can  be  written  in  terms  of  7/^(2) 

as 


m 


b(z)  =  mA  (z  -  c £  ^  1 . 

\  “1  +  A  J 


(7.24) 


where  we  have  substituted  Fr{x)  —  p,  I[a^oo)  hito  (7.23)  with  =  1. 

Equation  (7.24)  can  be  expressed  as  an  operational  law  on  the  bivariate  polynomial 


Lmz  as 


(7.25) 


where  am  —  c  Pi  \  /[  1  +  M  m)-  This  proves  that  By  /j  ^  Atalg-  D 


Proposition  7.46.  Assume  that  W y(cy)  is  an  N  x  N  Wishart-like  random  matrix. 
Let  Ay^^  A  be  an  N  x  N  random  Hermitian  non-negative  definite  matrix.  If  Wjy  (cy ) 
and  Ay  are  independent  then  By  =  Ay  x  Wy(cy)  ^ 4  B  as  cy  — ►  <\  The  Stieltjes 
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transform  11)13(2)  of  the  unique  distribution  function  Fli  satisfies 

<IFa(x) 


mg 


W  =  /{ 


c  ztub(z)}x  —  z 

PROOF.  This  result  may  bt*  found  in  Bai  and  Silverstein  [9,86].  □ 


(7,26) 


We  can  reformulate  Proposition  7.46  to  obtain  the  following  result  on  algebraic  random 
matrices. 


Theorem  7.47.  Let  Ay  and  Wy  (cat )  satisfy  the  hypothesis  of  Proposition  7. 4 6.  Then. 

Bjv  =  A/y  X  Wiy(cyy)  B  €  MaLg, 

as  cfs;  — *  c. 


PROOF.  By  rearranging  the  terms  in  the  numerator  and  denominator,  (7,26)  can  be 
rewritten  as 


rnB{z)  “ 


1 


1*4 


dFA(x) 


1  C  CZTH[j(z)  J  X  1  -c-c*mB(s) 

Let  =  l—r-  czihb(z)  so  that  (7.27)  can  be  rewritten  a5! 


(7.27) 


1  f  dFA(x) 

rng(z)  =  - -  /  : - j—r- 


We  can  express  nig(z)  in  (7.28)  in  terms  of  tha(z)  as 


rngiz)  = - m^(z/am,4). 


Q 


m,z 


Equation  (7.29)  can  be  rewritten  as 


(7.28) 


(7.29) 


mA(z/am,z)  =  am,z  mB(z). 


(7-30) 


Equation  (7.30)  can  be  expressed  as  an  operational  law  on  the  bivariate  polynomial 
Lnv,  as 

(7.31) 


LmZ(m,z)  =  LAJam,z  m,  z/om>,). 


This  proves  that  B.v  B  6  A4fl|fi.  □ 


Proposition  7.48.  Assume  that  G.v,/.  is  an  N  x  L  Gaussian-like  random  matrix.  Let 
An*-—*  A  be  an  N  x  N  symmetric/Hermitian  random  matrix  independent  of  G;v,l . 
A  at.  Let  Ay'  denote  an  N  x  L  matrix.  If  s  is  a  positive  real-valued  scalar  then 
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B.v  =  (A Jy2  +  \/sG:vx}(A]v2  +  v/iGw,t)'*B.  os  cL  =  N/L  — *  c  for  N,L  -  oo. 
The  Stieltjes  transform,  m b(z)  of  the  unique  distribution  function  Fri  satisfies  the 
equation 


rnB(z)  =  - 


/ 


_ dFA(x) _ 

2  { 1  +  ,s  cmB{z)}  -  ,+JiC^B(z)  +  S  {C  -l y 


(7.32) 


PROOF.  This  result  is  found  ill  Dozier  and  Silverstein  [30].  □ 

We  can  reformulate  Proposition  7.48  to  obtain  the  following  result  on  algebraic  random 
matrices. 


Theorem  7.49.  Assume  G  .v.l  and  s  satisfy  the  hypothesis  of  Proposition  7 
Then 

Bn  =  ( A J/2  +  )(A]f  4-  V^GN,L)'  ^  B  €  Mats, 

as  ci  =  N/L  —*  c  for  N,  L  —>  oo. 


Proof.  By  rearranging  the  terms  in  the  numerator  and  denominator,  (7.32)  can  be 
rewritten  as 


{ 1  +  sc  mu  {2) }  (IFa  (x) 

.r  -  {1  +  scmB(z)}(z  {1  +  semfi(i)}  +  (r-  1) «) 


(7.33) 


Let  am  =  1  +  scmu(z)  and  0m  =  {1  +  scm.B(z)}(2  {1  +  scms(z)}  +  (c—  l),?),  so  that 
ft  =  2  +  am  s(c  -  1).  Equation  (7.33)  can  hence  be  rewritten  as 


ms  (z) 


dFA(x) 
J  0tn 


(7.34) 


Using  the  definition  of  the  Stieltjes  transform  in  (6.1),  we  can  express  771/3(2)  in  (7.34) 
in  terms  of  771,4(2)  as 


mR{z)  =  amniA{Pm) 

=  o,n  2  +  «77>(g  -  1  )s). 


(7.35) 


Equation  (7.35)  can,  equivalently,  be  rewritten  as 


1 


mA(afn  z  +  o„,(c-  l)s)  =  — mu(z)- 

«,n 


(7-36) 


Equation  (7.36)  can  be  expressed  as  an  operational  law  on  the  bivariate  polynomial 

(7.37) 


7-mz  f*"s 


LiL(™'z)  =  LAz(m/am,a2  z  +  am  s{c  -  1)). 


This  proves  that  B.y  13  £  0 
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■  7.5  Sums  and  products  of  algebraic  random  matrices 

Proposition  7.51,  Let  A  v  ♦  A  and  By  B  be  N  x  N  symmetric/ Hermitian  ran¬ 
dom  matrices.  Let  Qv  be  a  Haar  distributed  unit,  ary /orthogonal  matrix  independent  of 
A,v  aru/ B  Then  C,y  =  Ay  +  Q  ,v  B  ,vQ^  >— — *  C.  The  associated  distribution  function 
Fc  is  the  unique  distribution  function  whose  R  transform  satisfies 

rc{g)  =  rA{g)  +  rB(g).  (7 .38 ) 

Proof.  This  result  was  obtained  by  Voiculescu  in  [1 06] -  □ 


We  can  reformulate  Proposition  7.51  to  obtain  the  following  result  on  algebraic  random 
matrices. 


Theorem  7.52.  Assume  that  Ay ,  B,v  and  Qy  satisfy  the  hypothesis  of  Proposition 
7 .51,  Then, 

C.,V  =  A,v  +  Q/vB/vQ e  Mali] 


Proof.  Equation  (7.38)  ran  be  expressed  as  an  operational  law  on  the  bivariate  poly¬ 
nomials  L*  and  L^,  as 


r  C  _  r  A  m  r  B 
tDr 


(7.39) 


If  Lmz  exists  then  so  docs  LrR 


and  vice-versa.  This  proves  that  C,v 


^CeXalg.  □ 


Proposition  7*53.  Let  A.\j  ►  A  and  B  .v  >— ►  D  be  N  x  N  symmetric / H ermitian  ran¬ 
dom.  matrices .  Let  Q\  be  a  Haar  distributed  unitary /orthogonal  matrix  independent  of 
A jv  and  B\.  Then  C/y  —  Ay  *  QjvBy  Qv  C  whew  C/v  is  defined  only  if  C/v  has 
real  eigenvalues  for  every  sequence  A  ,v  and  B\  The  associated  distribution  function 
Fc  is  the  unique  distribution  function  whose  S  transform  satisfies 


*c(y)  =  sA(y)sB{.y ).  (7.40) 

Proof.  This  result  was  obtained  by  Voiculescu  in  [107. 108] .  □ 


We  can  reformulate  Proposition  7,53  to  obtain  the  following  result  on  algebraic  random 
matrices* 


Theorem  7.54.  Assume  that  A;v-  and  B.\  satisfy  the  hypothesis  of  Proposition  7.53, 
Then 

C,V  =  A.V  X  Q  vB  vQA:  C  €  Malg  * 


PROOF,  Equation  (7.40)  can  be  expressed  a s  an  operational  law  on  the  bivariate  poly- 
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noinials  and  Lfy 


as 


tC  _  tA  m  r  B 

^sy  —  Lsy  ■“s  ^sy 


(741) 


If  Linz  exists  then  so  does  Lsv 


and  vice  versa.  This  proves  that  B/v 


D  €  jMaJg.  n 


Definition  7*55  ( Orthogonally /Unitarily  invariant  random  matrix).  If  thi 

joint  distribution  of  the  elements  of  a  random  matrix  is  invariant  under  orthog- 
onal/unitary  transformations ,  it  is  referred  to  as  an  orthogonally /unitarily  invariant 
random  matrix . 


If  A/v  (or  By)  or  both  are  an  orthogonally /unitarily  invariant  sequences  of  random 
matrices  then  Theorems  7*52  and  7.54  can  be  stated  more  simply. 


p  p 

Corollary  7.50.  Let  A.v  > — >  A  €  M ni;i  and  B/v  — *  be  a  orthogo¬ 

nally /unitarily  invariant  random  matrix  independent  of  A,\.  Then. 

1.  Cjv  -  A.\-  +  B;V  C  €  Mtliy 

2.  Cjv  =  A.v  x  Bjv  €  M  aiu 

Here,  multiplication  is  defined  only  if  C,\  has  real  eigenvalues  for  every  seguenct  A.v 
and  Bjv- 


When  both  the  limiting  eigenvalue  distributions  of  A ,v  and  B,v  have  compact  support, 
it.  is  possible  to  strengthen  the  mode  of  convergence  in  Theorems  7.52  and  7.54  to  almost 
surely  [46].  We  suspect,  that  almost  sure  convergence  must  hold  when  the  distributions 
are  not  compactly  supported;  this  remains  an  open  problem. 


Chapter  8 


The  polynomial  method: 
Computational  aspects 


■  8.1  Operational  laws  on  bivariate  polynomials 

The  key  idea  behind  the  definition  of  algebraic  random  matrices  in  Chapter  7  was  that 
when  the  limiting  eigenvalue  distribution  of  a  random  matrix  can  be  encoded  by  a  bi¬ 
variate  polynomial,  then  for  the  broad  class  of  random  matrix  operations  identified  in 
Chapter  7*  algebraicity  of  the  eigenvalue  distribution  is  preserved  under  the  transfor¬ 
mation. 

Our  proofs  relied  on  exploiting  the  fact  that  some  random  matrix  transformations, 
say  A.v  » — *  By,  could  be  most  naturally  expressed  as  transformations  of  L^r/  ■ — +  L^z; 
others  as  h — *  L ^  while  some  as  L^,  » — >  Lfy,  Hence,  we  manipulate  the  bivariate 

polynomials  to  the  form  needed  to  apply  the  appropriate  operational  law*  which  we 
ended  up  deriving  as  part  of  the  proof,  and  then  reverse  the  transformations  to  obtain 
the  bivariate  polynomial  L{*yi.  Once  we  have  derived  the  operational  law  for  computing 
from  Lt,  we  have  established  the  algebraicity  of  the  limiting  eigenvalue  distribu¬ 
tion  of  B/v  and  we  are  done. 

These  operational  laws,  the  associated  random  matrix  transformation  and  the  sym¬ 
bolic  Matlab  code  for  the  operational  law  are  summarized  in  Tables  8. 1-8.3*  The 
remainder  of  this  chapter  discusses  techniques  for  extracting  the  density  function  from 
the  polynomial  and  the  special  structure  in  the  moments  that  allows  them  to  be  dh- 
riently  enumerated  using  symbolic  methods. 
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Table  8.1*  Operational  laws  on  the  bivariate  polynomial  encodings  (and  their  computational  realization 
in  M atlas  )  corresponding  to  a  class  of  deterministic  and  stochastic  transformations.  The  Gauss! un¬ 
like  random  matrix  G  is  an  N  x  L ,  the  Wishartdike  matrix  W (c)  -  GG  where  N/L  — *  c  >  0  as 
N,L  — »  og,  and  the  matrix  T  is  a  diagonal  atomic  random  matrix. 


8  1  OPERATIONAL  LAWS  ON  BIVARIATE  POLYNOMIALS 
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Operational  Law 

Matlab  Code 

iL 

/  \ 

L^lz(2mv/i1  v/i)  -\Z*) 

\  / 

fflm 

I 

jB 

lihjj 

function  LmzB  =  squareA CLmzA) 
syms  m  z 

Lmzl  =  subs (LmzA ,z ,sqrt (z) ) ; 

Lmzl  =  subs (Lmzl ,m f2*m*sqrt (z) } ; 
Lmz2  =  subs (LmzA tz F -sqrt (z) ) ; 

Lmz2  =  subs (Lmz2 ,m ^2*m*sqrt{z) } ; 

LmzB  -  LlplusL2(Lmzl JLmz2,m) ; 

LmzB  “  ±rreducLuv(LmzB rm ,z) ; 

(a)  LL  — .  L*z  for  A  >— *  B  =  A2. 


Operational  Law 

Matlab  Code 

~~T7i  FT? 

^mz  inz 

[  I 

tA  fm  t  B  (  m 

^inzv  c  ■’  zf  Lmzl  c* 

\  / 

Bin 

I 

/  c 

Lmz 

function  LmzC  =  AblockB (LmzA , LmzB ,c) 
syms  m  z  mu 

LmzAl  =  subs(LmzA ,m ,m/c) ; 

LmzBl  =  subs(LmzB ,m ,m/( 1-c) ) ; 

LmzC  =  LlplusL2 ( LmzAl , LmzBl ,m) ; 

LmzC  =  irreducLuv(LmzC ,m , z) ; 

(b)  /^rn7,  Ln,7.  1 — Lmi  for  A.B  i — +  C  —  diag(A,  B)  where  Size  of  A/  Size  of  O  — *  c, 


Table  8.2,  Operational  laws  on  the  bivariate  polynomial  encodings  for  some  deterministic  random 
matrix  transformations.  The  operations  B3U  and  me  defined  in  Table  6,5. 
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Operational  Law 

MATLAB  Codr 

JA  777 

-mz  ^mz 

J  I 

t  A  rfi 

"l*  Lr% 

\  / 

Hr 

I 

z£ 

rg 

1 

/r 

^tnz 

function  LmzC  =  AplusB(LmzA ,LmzB) 
syms  m  z  r  g 

LrgA  =  Lmz2Lrg(LmzA) \ 

LrgB  =  Lmz2Lrg(LrazB) ; 

LrgC  =  LlplusL2{LrgA ,LrgB ,r) ; 

LmzC  =  Lrg2Lmz(LrgC) ; 

(a)  l' n,,  LL  —  Lg.  for  A.B  .  C  =  A  +  QBQ 


Operational  Law 

Mat  la  u  Code 

ja  TTi 

1  1 

tA  rB 

'  \  ,/ 

1 

Lg 

1 

tC 

function  LmzC  =  At imesBCLmzA , LmzB) 
syms  m  z  s  y 

LsyA  =  Lmz2Lsy(LmzA) ; 

LsyB  =  Lmz2Lsy (LmzB) ; 

LsyC  =  LltimesL2(LsyA1LsyBJs) ; 

LmzC  =  Lsy2Lmz(LsyC) ; 

(I,)  LJ*.,  —  VL  f«r  A,  B  * — *  C  =  A  x  QBQ 


Table  8,8.  Operational  laws  on  t  he  bivariate  polynomial  encodings  for  some  canonical  random  matrix 
transformations.  The  operations  EHU  and  Sly  are  defined  in  Table  6.5. 
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■  8.2  Interpreting  the  solution  curves  of  polynomial  equations 

Consider  a  bivariate  polynomial  Lmz .  Let  Dm  be  the  degree  of  Lmz{m^z)  with  respect 
to  m  and  lk(z),  for  k  —  be  polynomials  in  z  that  are  the  coefficients  of 

mk.  For  every  -  along  the  real  axis,  there  are  at  most  DIT1  solutions  to  the  polynomial 
equation  Lmz{m,z)  —  0.  The  solutions  of  the  bivariate  polynomial  equation  Lim  —  0 
define  a  locus  of  points  {m,  z)  in  C  x  C  referred  to  as  a  complex  algebraic  curve.  Since 
the  limiting  density  is  over  R,  we  may  focus  on  real  values  of  z . 

For  almost  every  there  will  be  Dm  values  of  m .  The  exception  consists  of  the 

singularities  of  Lmz{m^z)^  A  singularity  occurs  at  z  =  if: 

•  There  is  a  reduction  in  the  degree  of  m  at  zq  so  that  there  are  less  than  Dm  roots 
for  z  =  zq .  This  occurs  when  Iiym{zQ)  —  0.  Poles  of  Lmx(m^z)  occur  if  some  of 
the  m-solutions  blow  up  to  infinity. 

•  There  are  multiple  roots  of  Lrnz  at  z$  so  that  some  of  the  values  of  m  coalesce. 

The  singularities  constitute  the  so-called  exceptional  set,  of  LUY£(m,z),  Singularity 
analysis,  in  the  context  of  algebraic  functions,  is  a  well  studied  problem  [37]  from  which 
we  know  that  the  singularities  of  L^v/(rn,z)  are  constrained  to  be  branch  points , 

A  bmnch  of  the  algebraic  curve  Lmz(m,  z)  —  0  is  the  choice  of  a  locally  analytic  func¬ 
tion  nij{z)  defined  outside  the  exceptional  set  of  L^z(m,z)  together  with  a  connected 
region  of  the  C  x  R  plane  throughout  which  this  particular  choice  m.j(z)  is  analytic. 
These  properties  of  singularities  and  branches  of  algebraic  curve  are  helpful  in  deter¬ 
mining  the  atomic  and  non-atomic  component  of  the  encoded  probability  density  from 
Ltm>  We  note  that,  as  vet,  we  do  not  have  a  fully  automated  algorithm  for  extracting 
the  limiting  density  function  from  the  bivariate  polynomial.  Development  of  efficient 
computational  algorithms  that  exploit  the  algebraic  properties  of  the  solution  curve 
would  be  of  great,  benefit  to  the  community. 

■  8.2.1  The  atomic  component 

If  there  are  any  atomic  components  in  the  limiting  density  function,  they  will  necessarily 
manifest  themselves  as  poles  of  LM17(m,z).  This  follows  from  the  definition  of  the 
Stieltjes  transform  in  (6,1).  As  mentioned  in  the  discussion  on  the  singularities  of 
algebraic  curves,  the  poles  are  located  at  the  roots  of  Jpm(£).  These  may  be  computed 
in  Maple  using  the  sequence  of  commands: 

>  Dm  :=  degree (LmzA *m) ; 

>  lDmz  :=  coef f (LmzA ,m ,Dm) ; 

>  poles  :=  solve ( lDmz=0 ,z) ; 

We  can  then  compute  the  Puiseux  expansion  about  each  of  the  poles  at  z  —  zq. 
This  can  be  computed  in  Maple  using  the"  algcurves  package  as: 

>  wlth(algcurves) : 

>  puiseux (Lmz ,z-pole ,m , 1) ; 


144 


CHAPTER  8  THE  POLYNOMIAL  METHOD:  COMPUTATIONAL  ASPECTS 


For  the  pole  at  z  =  we  inspect  the  Puiseux  expansions  for  branches  with  leading 
term  l/fso  —  s).  An  atomic  component  in  the  limiting  spectrum  occurs  if  and  only  if  the 
coefficient  of  such  a  branch  is  non-negative  and  not  greater  than  one.  This  constraint 
ensures  that  the  branch  is  associated  with  the  Stieltjes  transform  of  a  valid  probability 
distribution  function. 

Of  course,  as  is  often  the  case  with  algebraic  curves,  pathological  cases  can  be 
easily  constructed.  For  example,  more  t  han  one  branch  of  the  Puiseux  expansion  might 
correspond  to  a  candidate  atomic  component,  £e.,  the  coefficients  are  non-negative 
and  not  greater  than  one.  In  our  experimentation,  whenever  this  has  happened  it 
has  been  possible  to  eliminate  the  spurious  branch  by  matrix  theoretic  arguments. 
Demonstrating  this  rigorously  using  analytical  arguments  remains  an  open  problem. 

Sometimes  it  is  possible  to  encounter  a  double  pole  at  z  —  zq  corresponding  to  two 
admissible  weights.  In  such  cases,  empirical  evidence  suggests  that  the  branch  with  the 
largest  coefficient  {less  than  one)  is  the  “right”  Puiseux  expansion  though  we  have  no 
theoretical  justification  for  this  choice, 

■  8.2,2  The  non-atomic  component 

The  probability  density  function  can  be  recovered  from  the  Stieltjes  transform  by  ap¬ 
plying1  the  inversion  formula  in  (6.4).  Since  the  Stieltjes  transform  is  encoded  in  the 
bivariate  polynomial  Lmsr  we  accomplish  this  by  first  computing  all  Dm  roots  along 
:  6  R  (except  at  poles  or  singularities).  There  will  be  Dm  roots  of  which  one  solution 
curve  will  be  the  "correct”  solution  ,  i.e,,  the  non-atomic  component  of  the  desired  den¬ 
sity  function  is  the  imaginary  part  of  the  correct  solution  normalized  by  7r.  In  Matlab 
,  the  Dm  roots  can  be  computed  using  the  sequence  of  commands: 

Lmz_  roots  =  []  ; 

x_range  =  [x.start :x_step:x_end] |; 

for  x  ~  x_ range 

Lmz_roots_imnorm  =  roots(sym2poly (subs(Lmz , z , x) } ) ; 

Lmz_roots  =  [Lmz^roots; 

real  (Lmz_roots_\innorm)  +  i*imag(Lmz_roots_mmorni) /pi]  ; 

end 

The  density  of  the  limiting  eigenvalue  distribution  function  can  be,  generically, 
be  expressed  in  closed  form  when  Dm  —  2,  When  using  root-finding  algorithms,  for 
Din  —  2,3,  the  correct  solution  can  often  be  easily  identified:  the  imaginary  branch 
will  always  appear  with  its  complex  conjugate.  The  density  is  just  the  scaled  (by  1  / tt ) 
positive  imaginary  component* 

When  Dm  >  4,  except  when  Lmv  is  bi-quadratic  for  Dm  =  4,  there  is  no  choice1  but 
to  manually  isolate  the  correct  solution  among  the  numerically  computed  Dm  roots  of 
the  polynomial  L\uzm,  z)  at  each  z  —  zq>  The  class  of  algebraic  random  matrices  whose 
eigenvalue  density  function  can  be  expressed  in  closed  form  is  thus  a  much  smaller 
subset  of  the  class  of  algebraic  random  matrices.  When  the  underlying  density  function 
is  compactly  support  ed,  the  boundary  points  will  be  singularities  of  the  algebraic  curve. 
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In  particular,  when  the  probability  density  function  is  compactly  supported  and  the 
boundary  points  are  not  poles,  they  occur  at  points  where  some  values  of  m  coalesce. 
These  points  are  the  roots  of  the  discriminant  of  Lmz,  computed  in  Maple  as; 

>  PossibleBoundaryPoints  -  solve(discrim(Lmz ,z) ; 

We  suspect  that  “nearly  all1'  algebraic  random  matrices  with  compactly  supported 
eigenvalue  distribution  will  exhibit  a  square  root  type  behavior  near  boundary  points 
at  which  there  are  no  poles.  In  the  generic  case,  this  will  occur  whenever  the  boundary 
points  correspond  to  locations  where  two  branches  of  the  algebraic  curve  coalesce. 

For  a  class  of  random  matrices  that  includes  a  subclass  of  algebraic  random  matri¬ 
ces.  t  his  has  been  established  in  [87].  This  endpoint  behavior  lias  also  been  observed 
orthogonally/unit arily  invariant  random  matrices  whose  distribution  has  the  element¬ 
wise  joint  density  function  of  the  form 

/(A)  -  CNexp(-NTrV(A))dA 

where  V  is  an  even  degree  polynomial  with  positive  leading  coefficient  and  dA  is  the 
Lebesgue  measure  on  N  x  N  symmetric/Herirutian  matrices.  In  [27],  it  is  shown  that 
these  random  matrices  have  a  limiting  mean  eigenvalue  density  in  the  N  — >  oo  limit  that 
is  algebraic  and  compactly  supported.  The  behavior  at  the  endpoint  typically  vanishes 
like  a  square  root,  though  higher  order  vanishing  at  endpoints  is  possible  and  a  full 
classification  is  made  in  [28].  In  f53]  it.  is  shown  that  square  root  vanishing  is  generic. 
A  similar  classification  for  the  general  class  of  algebraic  random  matrices  remains  an 
open  problem. 

Whether  the  encoded  distribution  is  compactly  supported  or  not,  the  — 1  jz  behavior 
of  the  real  part  of  Stieltjes  transform  (the  principal  value)  as  z  —>  ±oo  helps  isolate 
the  correct  solution.  In  our  experience,  while  multiple  solution  curves  might  exhibit 
this  behavior,  invariably  only  one  solution  will  have  an  imaginary  branch  that,  when 
normalized,  will  correspond  to  a  valid  probability  density.  Why  this  always  appears  to 
be  the  case  for  the  operational  laws  described  is  a  bit  of  a  mystery  to  us. 

Example:  Consider  the  M arcenko-Past u r  density  encoded  by  Lmz  given  in  Table  6.2(b). 
The  Puiseux  expansion  about  the  pole  at  2  —  0  (the only  pole!),  inis  coefficient  (1  -  1/c) 
which  corresponds  to  an  atom  only  when  c  >  1  (as  expected  using  a  matrix  theoretic: 
argument).  Finally,  the  branch  points  at  (1  ±  y/ c )2  correspond  to  boundary  points  of 
the  compactly  supported  probability  density.  Figure  8-1  plots  the  real  and  imaginary 
parts  of  the  algebraic  curve  for  c  —  2. 
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(a)  Real  component.  The  singularity  at  zero  corresponds  to  an  atom  of  weight  1/2.  The* 
branch  points  at  (I  ±  y/2)1  correspond  to  the  boundary  points  of  the  region  of  support. 


(b)  Imaginary  component  normalized  by  7 r.  The  positive  component  corresponds  to  the  encoded 
pro bab i I ity  de nsi  ty  function. 


Figure  8-1.  The  real  and  imaginary  components  of  the  algebraic  curve  defined  by  the  equation 
=  0,  where  =  czmz  —  (l  —  c  —  2}  m  -4-  I.  which  encodes  the  Marccnko-Fastur  density. 
The  curve  is  plotted  for  c  —  2.  The  —  \/z  behavior  of  the  real  pari  of  the  “correct  solution*1  as  z  — ►  00  is 
the  generic  behavior  exhibited  by  the  real  part  of  the  Stieltjes  transform  of  a  valid  probability  density 

function. 
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■  8.3  Enumerating  the  moments  and  free  cumulants 

In  principle,  the  moments  generating  function  can  be  extracted  from  £  by  a  Puiseux 
expansion  of  the  algebraic  function  /i(2)  about  z  —  0.  When  the  moments  of  an  alge¬ 
braic  probability  distribution  exist,  there  is  additional  structure  in  the  moments  and 
free  cumulants  that  allows  us  to  enumerate  them  efficiently.  For  an  algebraic  proba¬ 
bility  distribution,  we  conjecture  that  the  moments  of  all  order  exist,  if  and  only  if  the 
distribution  is  compactly  supported* 

Definition  8,31  (Rational  generating  function)*  Let  R|[x]]  denote  the  ring  of 
formal  power  series  (or  generating  functions)  in  x  with  real  coefficients.  A  formal 
power  series  (or  generating  function)  v  €  K [[?/]]  is  said  to  be  rational  if  their  exist 
polynomials  in  U,  P(u)  and  Q(u)f  Q(0)  ^  0  such  that 


v(u) 


Pju) 

Q(U)‘ 


Definition  8.32  (Algebraic  generating  function).  Let  R[[a:]]  denote  the  ring  of 
formal  power  series  (or  generating  functions)  in  x  with  real  coefficients,  A  formal 
power  series  (or  generating  function)  v  6  R[[u]j  is  said  to  be  algebraic  if  there  exist 
polynomials  in  u,  Pq(u),*.*, Pqu{u),  not  all  identically  zero ,  such  that 


Po(u)  +  P\{u)v  + . . .  +  fbv  («)<’Dv  =  0. 


The  degree  of  v  is  said  to  be  Dv . 

Definition  8,33  (Definite  generating  function).  Let  v  e  R[[u]].  //  there  exist 

polynomials  po(u ),  * . .  ,Pd(u)T  such  that 

Pd{u)vw  -f- 15  +  ....+  Pi(u)v(1>  +po(tt)  -  0,  (8.1) 

when'  v^'1  =  iP v /dtp .  Then  we  say  that  v  is  a  Definite  (short,  for  differentiably  finite) 
generating  function  (or  power  series).  The  generating  function,  v(u),  is  also  referred 
to  as  a  holonomic  function . 

Definition  8.34  (P- recursive  coefficients).  Let  an  for  n  >  0  denote  the  coefficients 
of  a  D- finite  series  v.  If  there  exist  polynomials  Pq  , . . . ,  6  R[n]  with  Pr  ^  0,  such 

that 

Pe{n)an+e  +  Pe_i(n)a7i+e_i  +  . . .  +  Po(n)a„  =  0, 

for  all  n  £  N,  then  the  coefficients  an  are  said  to  be  P -recursive  (short  for  polynomially 
recursive). 

Proposition  8*35.  Let  v  €  R[[u]]  be  an  algebraic  power  series  of  degree  Dv*  Then  v  is 
D- finite  and  satisfies  an  equation  (8.1)  of  order  Dv . 

PROOF.  A  proof  appears  in  Stanley  [94,  pp.187],  □ 
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The  structure  of  the  limiting  moments  and  free  cumulants  associated  with  algebraic 
densities  is  described  next. 


Theorem  8.36.  If  Ja  €  Vaig>  and  the  moments  exist  then  the  moment  and  free  cumu- 
lant  generating  functions  arc  algebraic  power  series .  Moreover,  both  generating  func¬ 
tions  are  D- finite  and  the  coefficients  are  P -recursive. 


Proof.  If  e  Paig,  then  L*Z  exists.  Hence  L*z  and  exist,  so  that  Pa(z)  and 
T&(g)  are  algebraic  power  series.  By  Theorem  8.35  they  are  D-finite;  the  moments  and 
fret1  cumulonts  are  hence  P-  recursive.  □ 

There  are  powerful  symbolic  tools  available  for  enumerating  the  coefficients  of 
algebraic  power  series.  The  Maple  based  package  gfun  is  one  such  example  [77]. 
From  the  bivariate  polynomial  Lfr/,  we  can  obtain  the  series  expansion  up  to  degree 
expans ion.degree  by  using  the  commands: 

>  with (gfun): 

>  MomentSeries  -  algeqtoseries (Lmyuz, z ,myu , expansion_degree f *pas_slopes * ) ; 

The  option  pos^slopes  computes  only  those  branches  tending  to  zero.  Similarly, 
the  free  cuinulants  can  be  enumerated  from  Lrg  using  the  commands: 

>  wlth(gf un) : 

>  FreeCumilantSeries  =  algeqtoseries(Lrg ,g , r , expansi on^degree , *pos_slopes * ) ; 

For  computing  expansions  to  a  large  order,  it  is  best  to  work  with  the  recurrence 
relation.  For  an  algebraic:  power  series  u(it),  the  first  number _of  .terms  coefficients  can 
be  computed  from  Luv  using  the  sequence  of  commands: 

>  with (gfun) : 

>  deq  :=  algeqtodif f eq(Luv # v(u) ) ; 

>  rec  :=  dif f eqtorec (deq , v(u) ,a(n) ) ; 

>  p_generator  :=  rectoproc (rec ,a(n) * list) : 

>  p^generat or ( number _of_terms) ; 

Example:  Consider  the  Marcenko- Pastor  density  encoded  by  the  bivariate  polynomials 
listed  in  Table  6.2(b).  Using  the  above  sequence  of  commands,  we  can  enumerate  the 
first  five  terms  of  Its  moment  generating  function  as 

j/.(s)  -  l  +  2  +  {c+l)r2  +  (3c  +  c2  +  1)23+  (6c2+c3+6c+l)z4  +  0(ir5). 


The  moment  generating  function  is  a  D-Finite  power  series  and  satisfies  the  secoml 
order  differential  equation 
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-2  +  zc  -  1  +  (— z  -  ZC  +  l)/i  (z)  +  (z3C2  —  2  z~c  —  2  z3c  +  2  —  2  Z~  +  Z3)  —fi  (*}  —  0, 

with  initial  condition  fi( 0)  -  1,  The  moments  Mn  =  a(n)  themselves  are  P-recursjve 
satisfying  the  finite  depth  recursion 

(-2  c  4-  c2  4-  1 )  na  (n.)  4-  ((-2  —  2  c)  n  -  3c-  3)  o  (u  +  1}  +  (3  +  tz)  a  (n  +  2}  =  0 


with  the  initial  conditions  a  (0)  =  1  and  a(l)  =  1.  The  frcx^  cumulants  can  be  analo¬ 
gously  computed. 

What  we  find  rather  remarkable  is  that  for  algebraic  random  matrices,  it  is  often 
possible  to  enumerate  the  moments  in  closed  form  even  when  the  limiting  density  func¬ 
tion  cannot.  The  linear  recurrence  satisfied  by  the  moments  may  be  used  to  analyze 
their  asymptotic  growth. 

When  using  the  sequence  of  commands  described,  sometimes  more  than  one  solu¬ 
tion  might  emerge.  In  such  cases,  we  have  often  found  that  one  can  identify  the  correct 
solution  by  checking  for  the  positivity  of  even  moments  or  the  condition  ;j(()}  —  1.  More 
sophisticated  arguments  might  be  needed  lor  pathological  eases.  It  might  involve  veri¬ 
fying.  using  techniques  such  as  those  in  [3],  that  the  coefficients  enumerated  correspond 
to  the  moments  a  valid  distribution  function. 


■  8.4  Computational  free  probability 

There  is  a  deep  connection  between  eigenvalue  distributions  of  random  matrices  and 
'Tree  probability”  (See  Appendix  A  for  a  brief  discussion).  We  now  clarify  the  con¬ 
nection  between  the  operational  law  of  a  subclass  of  algebraic  random  matrices  and 
the  convolution  operations  of  free  probability.  This  will  bring  into  sharp  focus  how  the 
polynomial  method  constitutes  a  framework  for  computational  free  probability  theory. 

Proposition  8.41.  Lei  A ;y  A  and  B . v  B  be  two  asymptotically  free  random 

matrix  sequences  as  in  Definition  A  A,  Then  A  ,y  +  B  y  A  +  D  and  Ay  x  By  AD 

(where  the  product  is  defined  whenever  A  y  x  By  has  real  eigenvalues  for  every  A  y  and 
B y )  with  the  corresponding  limit  eigenvalue  density  functions,  $a+B  and  /ah  given  by 

fA+B  =  fA  ffl  fs  (8,2a) 

f  AB  =  f  A  fs  (8.2b) 

where  EB  denotes  free  additive  convolution  and  S  denotes  free  nmltiplicative  convolution. 
These  convolution  operations  can  be  expressed  in  terms  of  the  R  and  S  transforms  as 
described  in  Propositions  7 .51  and  7.53  respectively. 
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Free  additive  convolution  Ia+b  =  f a  53  fa 

Free  multiplicative  convolution  Ja^b  =  Ia  ^  fa 


I  /t+B  _  j  A  tri  j  B 
■^rg  —  ^rg 


Table  8.4.  Implicit  representation  of  the  free  convolution  of  two  algebraic  probability  densities. 


Proof.  This  result  appears  for  density  functions  with  compact  support  in  [106,107]* 
li  was  later  strengthened  to  the  case  e  of  density  functions  with  unbounded  support  . 
See  [46]  for  additional  details  and  references.  □ 

In  Theorems  7.52  and  7.54  we,  in  effect,  showed  that  the  fret1  convolution  of  alge¬ 
braic  densities  produces  an  algebraic  density.  This  stated  succinctly  next  . 


Corollary  8.42.  Algebraic  probability  distributions  form  a  semi-group  under  fire  ad¬ 
ditive  convolution . 

Corollary  8.43.  Algebraic  distributions  with  positive  semi-definite  support  form  a 
semi- group  under  free  multiplicative  convolution* 


This  establishes  a  framework  for  computational  free  probability  theory  by  iden¬ 
tifying  the  class  of  distributions  for  which  the  free  convolution  operations  produce  a 
“computable”  distribution. 

■  8.4.1  Implicitly  encoding  the  free  convolution  computations 

The  computational  framework  established  relies  on  being  able  to  implicitly  encode  free 
convolution  computations  as  a  resultant  computation  on  appropriate  bivariate  polyno¬ 
mials  as  in  Table  8.4.  This  leads  to  the  obvious  question:  Are  there  other  more1  effective 
ways  to  implicitly  encode  free  convolution  computations?  The  answer  to  this  rhetorical 
question  will  bring  into  sharp  focus  the  reason  why  the  bivariate  polynomial  encoding 
at  the  heart  of  the  polynomial  method  is  indispensable  for  any  symbolic  computational 
implementation  of  free  convolution.  First,  we  answer  the  analogous  question  about  the 
most  effective  encoding  for  classical  convolution  computations. 

Recall  t  hat.  classical  convolution  can  be  expressed  in  terms  of  the  Laplace  transform 
of  the  distribution  function.  In  what  follows,  we  assume  that  t  he  distributions  have 
finite  moments1.  Hence  the  Laplace  transform  can  be  written  as  a  formal  exponential 
moment  generating  function.  Classical  additive  and  multiplicative  convolution  of  two 
distributions  produces  a  distribution  whose  exponential  moment  generating  function 
equals  the  series  (or  Cauchy)  product  and  the  coefficient -wise  (or  Hadamard)  product  of 

]ln  the  general  case,  loots  from  complex  analysis  can  be  used  to  extend  the  argument. 
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the  individual  exponential  moment,  generating  functions,  respectively.  Often,  however, 
the  Laplace  transform  of  either  or  both  the  individual  distributions  being  convolved 
cannot  be  written  in  closed  form.  The  next  best,  thing  to  do  then  is  to  find  an  implicit 
way  to  encode  the  Laplace  transform  and  to  do  the  convolution  computations  via  this 
representation . 

When  this  point  of  view  is  adopted,  the  task  of  identifying  candidate  encodings  is 
reduced  to  finding  the  class  of  representations  of  the  exponential  generating  function 
that  remains  closed  under  the  Cauchy  and  Hadamard  product.  Clearly,  rational  gener¬ 
ating  functions  {see  Definition  8.31)  satisfy  this  requirement.  It  is  shown  in  Theorem 
(i.4,12  [94,  pp.  1 94] ,  that  D- finite  generating  functions  (set1  Definition  8.33)  satisfy  this 
requirement  as  well. 

Proposition  8.35  establishes  that,  all  algebraic  generating  functions  (see  Definition 
8.32)  and  by  extension,  rational  generating  functions,  are  also  D-finite.  However,  not  all 
D-finite  generating  functions  are  algebraic  (see  Exercise  (i.l  [94,  pp.  217]  for  a  counter¬ 
example)  so  that  algebraic  generating  functions  do  not  satisfy  the  closure  requirement  . 
Furthermore,  from  Proposition  fi.4.3  and  Theorem  fi.4.12  in  [94],  if  the  ordinary  gener¬ 
ating  function  is  D-finite  then  so  is  the  exponential  generating  function  and  vice  versa. 
Thus  D-finite  generating  functions  are  the  largest  class  of  generating  functions  for  which 
classical  convolution  computations  can  be  performed  via  an  implicit  representation. 

In  the  context  of  developing  a  computational  framework  based  on  the  chosen  implicit 
representation,  it  is  important  to  consider  computability  and  algorithmic  efficiency  is¬ 
sues.  The  class  of  D-finite  functions  is  well-suited  in  that  regard  as  well  [77]  so  that  we 
regard  it  as  the  most  effective  class  of  representations  in  which  the  classical  convolution 
computations  may  be  performed  implicitly. 

However,  this  class  is  inadequate  for  performing  free  convolution  computations  im¬ 
plicitly.  This  is  a  consequence  of  the  prominent  role  occupied  in  this  theory  by  ordi¬ 
nary  generating  functions.  Specifically,  the  ordinary  formal  H  and  S  power  series,  are 
obtained  from  the  ordinary  moment  generating  function  bv  functional  inversion  (or  re¬ 
version),  and  are  the  key  ingredients  of  free  additive  and  multiplicative  convolution  (set™ 
Propositions  8.41.  7.51  and  7,53).  The  task  of  identifying  candidate  encodings  is  thus 
reduced  to  finding  the  class  of  representations  of  the  ordinary  moment  generating  func¬ 
tion  that  remains  closed  under  addition,  the  Cauchy  product,  and  reversion.  D-finite 
funct  ions  only  satisfy  the  first  two  conditions  and  are  hence  unsuitable  representations. 

Algebraic  functions  do,  however,  satisfy  all  three  conditions.  The  algorithmic  effi¬ 
ciency  of  computing  the  resultant  (see  Section  fi.4)  justifies  our  labelling  of  the  bivariate 
polynomial  encoding  as  the  most  effective  way  of  implicitly  encoding  free  convolution 
computations.  The  candidacy  of  construct iblv  D-finite  generating  functions  [14],  which 
do  not  contain  the  class  of  D-finite  functions  but  do  contain  the  class  of  algebraic  func¬ 
tions,  merits  further  investigation  since  they  are  closed  under  reversion,  addition  and 
multiplication.  Identifying  classes  of  representations  of  generating  functions  for  which 
both  the  classical  and  free  convolution  computations  can  be  performed  implicitly  and 
effectively  remains  an  important  open  problem. 
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Chapter  9 


The  polynomial  method: 

Applications 


We  illustrate  the  use  of  the  computational  techniques  developed  in  Chapter  8  with  some 
examples* 

■  9,1  The  Jacobi  random  matrix 

The  Jacobi  matrix  ensemble  is  defined  in  terms  of  two  independent  Wishart  matrices 
Wjfeq)  and  W2(c£)  as  J  =  (I  +  W-2(c2)  W^ft*]  ))-1.  The  subscripts  are  not  to  be 
confused  for  the  size  of  the  matrices.  Listing  the  computational  steps  needed  to  generate 
a  realization  of  this  ensemble,  as  in  Table  9.1,  is  the  easiest  way  to  identify  the  sequence 
of  random  matrix  operations  needed  to  obtain 


Transformation 

N  i  i  m er  ical  M  AT  L  A  B  cod  e 

Symbolic  MATLAB  code 

initializat  ion 

%  Pick  n.  el,  c2 

Nl=n/cl;  N2=n/c2; 

%  Define  symbolic  variables 
syms  m  c  z; 

A,  =1 

A1  =  eye(n,n); 

Lmzl  =  ni*{ 

As  =  Wi(ci )  x  A, 

G1  =  randn(n,N  l)/sqrt(NI); 

W1  =  Gl*Gl’; 

A  2  =  W1*A1; 

Lmz2  —  AtirnesWisli(Lmzl  ,cl ); 

1  :n 

< 

II 

n 

< 

A3  =  inv(  A2); 

Lmz3  =  invA(Lmz2); 

A,  =  W2(ea)  x  A3 

02  =  ramtn(n,N2)/sqrt(N2); 
W2  =  G2*G2’; 

A4  =  W2*A3; 

Lmz4  =  AtimesWish(Lmz3,c2); 

A  r,  =  A.i  +  I 

A5  =  A4+I; 

Lmz5  =  shift  A{Lmz4J); 

Ab  =  Ab-' 

A6  =  inv(A5); 

Lmz6  =  invA{Lniz5)i 

Table  9.1*  Sequence  of  MATLAB  commands  for  sampling  the  Jacobi  ensemble.  The  functions  used 
to  generate  the  corresponding  bivariate  polynomials  symbolically  are  listed  in  Table  8.1 
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We  first  start  off  with  A|  =  I.  The  bivariate  polynomial  that,  encodes  the  St.ielt.jos 


transform  of  its  eigenvalue  distribution  function  is  given  by 

=  (1  (9.1) 

For  A2  —  W[(f'j)  x  Ai,  we  can  use  (7*31)  to  obtain  the  bivariate  polynomial 

z)  =  zc\  m2  -  ( — ei  -  z  4- 1  )  m  +  L  (9*2) 

For  A  3  =  A.J1,  from  (7.7),  we  obtain  the  bivariate  polynomial 

2)  =  -2cim2  +  (ci*+2-l)m  +  l.  (9.3) 

For  Aj  =  Wsfcs)  x  A3.  We  can  use  (7.31)  to  obtain  the  bivariate  polynomial 

i^(m,  z)  =  (ci  z2  -f  C2  m2  +  (cj  £  4-  £  —  1  4-  02)771  +  1.  (9.4) 

For  A5  —  A 1  4- 1,  from  (7.7),  we  obtain  the  bivariate  polynomial 

=  ((*  ~  i)ici  +  c2  (z  -  1))  m2  +  (c\  (z  -  1)  4-  Z  -  2  +  C2)m  4  L  (9*5) 


Finally,  for  J  —  Afi  —  \  from  (7.7),  we  obtain  the  required  bivariate  polynomial 


L;Jr/(77i,  z)  =  L^z(r n,  z)  =  (c\  z  +  z:iCi  -2ci  z1  -  c2  z 3  4-  e2  z2)  m2 

+  ( — 1  4"  2  z  +  c\  —  3  C]  z  -4  2  t+i  z2  4"  c2  z  —  2  c2  z2)  777  —  c*j  z  —  c  j  4~  2  +  cj  z.  (9*b) 

Using  matrix  theoretic  arguments,  it  is  clear  that  the  random  matrix  ensembles  A3, . . .  A(; 
are  defined  only  when  c\  <  1.  There  will  be  an  atomic  mass  of  weight  (1  —  1  /c2)  at  1 
whenever  c2  >  T  The  non-atomic  component  of  the  distribution  will  have  a  region  of 
support  5n  —  (a_,  a+ ),  The  limiting  density  function  for  each  of  these  ensembles  can 
be  expressed  as 


/a(*) 


y/(-T  ~  <*-)(«+  ~  J~) 
2  7rl2(*T) 


for  a—  <  x  < 


(9*7) 


for  i  —  2 . ft*  where  n.+  ,  where  the  polynomials  i!2(x)  are  listed  in  Table  9*2* 

The  moments  for  the  general  case  when  C\  ^  c2  can  be  enumerated  using  the  techniques 
described;  they  will  be  quite  messy.  Instead,  consider  the  special  case  when  c\  —  c2  —  c. 
Using  the  tools  described,  the  first  four  terms  of  the  moment  series,  //(z)  —  /ij(z),  can 
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fa(*> 

4, 

XC] 

(liv^rr 

x2  Cl 

r~ 

OT^T)2 

CiX2  4  C2X 

1  4  Cl  4  C2  —  Cl  €2  4  2yCl  4  €2  —  C1C2 

(1  -  Cl)2 

A  5 

o(x  -  l)2  4  c^(x  -  1) 

cf  —  cl  4  2  4  Ca  —  cl ca  ±  2^ci  4  C2  —  C1C2 

(l-c,)2 

Aft 

(c!  X  4  XaC’i  -  2  Cl  x2  —  C2  X3  4  C2  X2) 

(i-cir 

cf  —  cl  4  2  4  C2  —  CjCa  4  2\/ci  4  C2  -  C1C2 

Table  9.2.  Parameters  for  determining  the  limiting  eigenvalue  density  function  using  (9.7). 


Figure  9-1.  The  limiting  density  (solid  line),  /,^(x),  given  by  (9.7)  with  n  —  0.1  and  C2  —  0-625 
is  compared  with  the  normalized  histogram  of  the  eigenvalues  of  a  Jacobi  matrix  generated  using  the 
code  in  Table  9J  over  4000  Monte-Carlo  trials  with  n  =  100,  JVj  —  njc\  =  1000  and  —  n/c 2  =  160. 


be  computed  directly  from  as 


"(a,“  I  +(fs+j)i  +  (^c+i)  aJ  +  (^c!  +  B,:'n8t'‘  +  ii)a'' 

♦(-s^  +  s'+a)*4*0^ 

The  moment  generating  function  satisfies  the  differential  equation 

— 3 ::  4  2  4  4  (—Or"  4-  4  10  z  4  z'^c2  —  2  z^c  —  4)  /t  (2) 

+  (i4  -  5  23  -  2  Z4C  +  8  22  +  *V  +  2  23C  -  4  i  -  2  V)  ^(2)  =  o, 
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with  the  initial  condition  /t(0)  =  1.  The  moments  a  (n)  —  M„  themselves  are  P- recursive 
and  obtained  by  the  recursion 

(— 2  c  4-  c2  4- 1  +  (-2c  +  c2  +  l)  n)  a  (n)  +  ((-5  +  2c  —  c2)  n  —  11  +  2c  —  c2)  a  (»  +  1) 

■I-  (2(3  +  8  n)  a  -T  2)  4  ( — 1(3  —  4  n)  a  (u  4  3)  —  U* 

with  the  initial  conditions  c(0)  =  1/2,  «(1)  =  l/8c4  1/4.  and  a(2)  —  3/16  c 4-  1/8.  We 
can  similarly  compute  the  recursion  for  the  free  cumulants,  a(n)  —  A'ri+t ,  as 

nc2a  (n)  4  (12  +  4  n)  a  (n  4  2)  =  0, 

with  the  initial  conditions  a(0)  =  1/2.  and  a(l)  =  1/8 r. 

■  9.2  Random  compression  of  a  matrix 


Theorem  9.21.  Let  A#  t— *  A  €  Vuir  Let  Q,v  be  an  N  x  N  Haar  unitary /orthogonal 
random  matrix  independent  of  As-  Let  B„  be  the  upper  n  x  n  block  of  Q,vA  \  Q  v. 
Then 

B„  I—  B  €  V,Uy 

as  n/N  — *  c  for  n,  N  — >  oc. 

Proof.  Let.  P.v  be  an  N  x  N  projection  matrix 


P.v  —  Q.v 


By  definition,  P/v  is  an  atomic  matrix  so  that  P.v  —*  P  6  .M;1|S  as  n/N  — *  c  for 
h,  N  —>  oo.  Let  B,v  =  Pat  x  A s-  By  Corollary  7.50,  B,v  —>  B  €  .Mfl is.  Finally,  from 
Theorem  7.36,  we  have  that  B„  — >  B  €  i&.  0 


The  proof  above  provides  a  recipe  for  computing  the  bivariate  polynomial 
explicitly  as  a  function  of  L‘^vl  and  the  compression  factor  c.  For  this  particular  appli¬ 
cation,  however,  one  can  use  first  principles  [93]  to  directly  obtain  the  relationship 

'B(g)  =  rA{cg), 


expressed  in  terms  of  the  R  transform.  This  translates  into  the  operational  law 


(9.8) 


Example:  Consider  the  atomic  matrix  A ,v  half  of  whose  eigenvalues  are  equal  to  one 
while  the  remainder  are  equal  to  zero.  Its  eigenvalue  distribution  function  is  given  by 


X 


Figure  9-2,  The  limiting  eigenvalue  density  function  (solid  line)  of  the  top  OA N  x  0 AN  block  of  a 
randomly  rotated  matrix  is  compared  with  the  experimental  histogram  collected  over  4000  trials  with 
N  —  200.  Half  oF  the  eigenvalues  of  the  original  matrix  were  equal  to  one  while  the  remainder  were 
equal  to  zero. 


(6,26).  From  the  bivariate  polynomial,  in  Table  6.2(a)  and  (9.8)  it  can  be  show 
that  the  limiting  eigenvalue  distribution  function  of  BTn  constructed  from  A.v  as  in 
Theorem  9.21,  is  encoded  by  the  polynomial 

=  (-2 cz2  +  2  cz)  m2  -  {-2  c  +  4  cz  +  1  -  2  z)  m  -  2  c  +  2, 


where  c  is  the  limiting  compression  factor.  Poles  occur  at  \z  =  0  and  z  =  1.  The  leading 
terms  of  the  Puiseux  expansion  of  the  two  branches  about  the  poles  at  2  =  zq  are 


£  ~  £0 
— 2  c  4-  4  c2 


+ 


1  2c-2  \ 

2c  )  z-zq'  -1+2cJ' 


It  can  be  easily  seen  that  when  c  >  1/2,  the  Puiseux  expansion  about  the  poles  £  —  zq 
will  correspond  to  an  atom  of  weight  Wq  =  (2c  -  l)/2c.  Thus  the  limiting  eigenvalue 
distribution  function  has  density 


ftj(s)  =  max  | 

!  „  v  .  1  \Z(*_  “-)(“■+  -*)  7  _ _ 1 

1  6 ^  n  2 xc  -  2  ex2  /|a-  'a  +  1  +  1 

|<S(x-l), 

(9.9) 


where  a±  —  1/2  ±  y/  —c2  +  c.  Figure  9.2  compares  the  theoretical  prediction  in  (9.9) 
with  a  Monte-Carlo  experiment  for  c  —  0.4. 
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Figure  9-3.  Additive  convolution  of  equilibrium  measures  corresponding  to  potentials  V)  (;?■)  and  VT(j'), 


From  the  associated  bivariate  polynomial 

L%  =  (~2c  +  2cz)fi2  +  (z-2-2cz  +  Ac)ii-2c  +  2. 


we  obtain  two  series  expansions  whose  branches  tend  to  zero.  The  first  four  terms  of 
the  series  are  given  by 


,  1  1  +c 

l  +  2Z  +  ~' 


3  4-  c 


+  0(z<). 


(9.10) 


andT 


c  —  1 


r 


+ 


c  -  1 
2c 


(c  -  1)  C-2  +  c)  2  (c—  1)  (3c  —  4)  3  ,  4A 

Yc  -  8T  *  > 


(9.11) 


respectively.  Since  c  <  1,  the  series  expansion  in  (9.11)  can  he  eliminated  since  ji(O) 

/  dFB(x)  =  L  Thus  the  coefficients  of  the  series  in  (9.10)  are  the  correct  moments  of 
the  limiting  eigenvalue  distribution.  A  recursion  for  the  moments  can  be  readily  derived, 
using  the  techniques  developed  earlier. 


■  9.3  Free  additive  convolution  of  equilibrium  measures 

Equilibrium  measures  are  a  fascinating  topic  within  random  matrix  theory.  They  arise 
in  the  context  of  research  that  examines  why  very  general  random  models  for  random 
matrices  exhibit  universal  behavior  in  the  large  matrix  limit .  Suppose  we  are  given  a 
potential  V(x)  then  we  consider  a  sequence  of  Hermit. ian,  nnitarily  invariant  random 
matrices  A  y,  the  joint  distribution  of  whose  elements  is  of  the  form 

P{ An)  ot  exp(-jVTY  V(Ajv))  rfA*, 
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where  dAy  —  n<<;(4A./v)jj.  The  equilibrium  measure,  when  it  exists,  is  the  unique 
probability  distribution  function  that  minimizes  the  logarithmic  energy  (see  [29]  for  ad¬ 
ditional  details).  The  resulting  equilibrium  measure  depends  explicitly  on  the  potential 
V{x)  and  can  be  explicitly  computed  for  some  potentials.  In  particular,  for  potentials 
of  the  form  V(a?)  —  tx2m,  the  Stieltjes  transform  of  the  resulting  equilibrium  measure 
is  an  algebraic  function  [29,  Chp.  6.7,  pp.  174-175]  so  that  the  equilibrium  measure  is 
an  algebraic  distribution.  Hence  we  can  formally  investigate  the  additive  convolution 
of  equilibrium  measures  corresponding  to  two  different  potentials.  For  V\  (3:)  =  x2,  the 
equilibrium  measure  is  the  (scaled)  semi-circle  distribution  encoded  by  the  bivariate 
polynomial 


i^sm2  +  2m:  +  2. 

For  Vi{x)  ~  ad,  the  equilibrium  measure  is  encoded  by  the  bivariate  polynomial 

=  1/4  m2  +  mz 1  +  z2  +  2/9 1/3. 

Since  Ay  and  B,v  arc  unitarily  invariant  random  matrices,  if  Ay  and  B;v  are  indepen¬ 
dent,  then  the  limiting  eigenvalue  distribution  function  of  Cy  =  Ay  +  B,v  can  be  com¬ 
puted  from  and  L^y/.  The  limiting  eigenvalue  density  function  fc{x)  is  the  free  ad¬ 
ditive  convolution  of  /a  and  f[j.  The  MATLAB  command  LmzC  -  AplusBCLmzA.LmzB) ; 
will  produce  the  bivariate  polynomial 


=  -9  m4  -  54  m3z  +  {-108  s2  -  36)  m 2  -  (72  +  72  c)  m  -  72  z2  -  16  \/3. 

Figure  9.3  plots  the  probability  density  function  for  the  equilibrium  measure  for  the 
potentials  Vi(ar)  —  :r2  and  V2 (a;)  =  xA  as  well  as  the  free  additive  convolution  of  these 
measures.  The  interpretation  of  the  resulting  measuring  in  the  context  of  potential 
theory  is  not  clear.  The  matrix  Cjv  will  no  longer  be  unitarily  invariant  so  it  might  not 
sense  to  look  for  a  potential  Va(aO  for  which  Fc  is  an  equilibrium  measure.  The  tools 
and  techniques  developed  in  this  article  might  prove  useful  in  further  explorations. 

■  9.4  Other  applications 

There  is  often  a  connection  between  well-known  combinatorial  numbers  and  random 
matrices.  For  example,  the  even  moments  of  the  Wigner  matrix  are  the  famous  Cata¬ 
lan  numbers.  Similarly,  if  W y(c)  denotes  the  Wishart  matrix  with  parameter  c,  other 
combinatorial  correspondences  can  be  easily  established  using  the  techniques  developed. 
For  instance,  the  limiting  moments  of  W,v(l)  —  I/v  are  the  Riordan  numbers,  the  large 
Schroder  numbers  correspond  to  the  limiting  moments  of  2Wy(0.5)  while  the  small 
Schroder  numbers  are  the  limiting  moments  of  4W,v(0.125).  Combinatorial  identities 
along  the  lines  of  those  developed  in  [33]  might  result  from  these  correspondences. 
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Chapter  10 


The  polynomial  method: 
Eigenvectors  of  random  matrices 


Proposition  8*41  succinctly  captures  an  important  connection  between  free  probability 
and  random  matrices*  Specifically,  free  probability  provides  the  analytic  machinery 
for  computing  the  limiting  eigen va/ue  distribution  of  Ay  4-  By  and  AyBy  from  the 
limiting  eigenvalue  distribution  of  A  y  and  B  y  when  they  are  asymptotically  free*  A 
less  well-known  fact  is  that  it  also  provides  us  with  a  machinery  for  computing  the 
limiting  conditional  "'eigenvector  distribution"  of  the  eigenvectors  of  Ay  +  By* 

Note  that  if  B/v  is  small  (in  some  appropriate  norm  sense)  compared  to  A  as  then 
the  eigenvectors  of  Ay  +  By  should  be  close  to  those  of  Ay  so  that  standard  per¬ 
turbation  theory  as  in  [90]  should  he  able  to  adequately  describe  the  transformation 
in  the  eigenvectors*  The  power  of  the  free  probabilistic  framework  is  that  it  makes  no 
assumptions  on  the  relative  norms  of  Ay  find  By  except  that  their  limiting  eigenvalue- 
distributions  exist.  The  machinery  for  analytically  characterizing  the  eigenvectors  was 
developed  by  Biane  in  [15]  in  the  context  of  his  investigation  of  processes  with  free  in¬ 
crements*  The  applicability  of  these  results  for  describing  the  conditional  “eigenvector 
distribution11  is  mentioned  in  [16,  pp.  70]* 

In  this  chapter,  we  summarize  Diane's  relevant  results  from  [15],  and  define  the  sub¬ 
class  of  algebraic  random  matrices  for  which  the  conditional  “eigenvector  distribution" 
is  algebraic  as  well.  As  before,  algebraic! ty  of  this  subclass  acts  as  a  certificate  of  the 
computability  of  the  limiting  conditional  “eigenvector  distribution," 

■  10.1  The  conditional  "eignenvector  distribution" 

Consider  the  random  matrices  A  =  A,v  and  B  =  B.\  with  limiting  eigenvalue  distri¬ 
bution  functions  given  by  FA  and  FB,  respectively,  Let  u u:v  and  V[,...,vjy  be 

the  eigenvectors  of  A,v  and  A  ,v  +  B ,v ,  associated  with  the  eigenvalues  Af . A^.  and 

Af +B, . . . ,  A£+B.  respectively. 

The  passage  from  t he  old  basis  to  the  new  basis  is  given  by  the  N  x  N  sized  transition 
matrix  whose  (t,  j)-th  entry  is  the  projection  (v,-.  Uj).  Since  the  eigenvectors  are  only 
defined  up  to  some  complex  number  with  modulus  one,  Biane  considers  the  numbers 
|{Vj,Uj)j2,  which  form  a  bistochastic  matrix. 
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While  it  is  not  meaningful  to  speak  of  the  limit  of  the  entries  of  the  bistochastic 
matrix  itself,  it  does  make  sense  to  ask  if  the  entries  have  some  definite  asymptotic 
behavior  as  N  —*  oo,  Let.  g  and  h  be  smooth  function  on  R  and  consider  the  asymptotic 
behavior  of  the  expression 

y  X/fr  (s(Ajv)fc(Ajv  +  Bn))  =  /i(A^+B)s(AA)|(Vi,uJ}|2.  (10.1) 

]<isj<N 

In  what  follows  it  is  established  how  the  conditional  “eigenvector  distribution”  is  en¬ 
coded  by  a  Markov  transition  kernel  density  function. 


Proposition  10.11,  Let  A  y  and  B  y  be  asymptotically  free  sequences  of  random  ma¬ 
trices  that  satisfy  the  hypotheses  in  Proposition  8.4  L  Let  g  and  h  be  smooth  functions 
on  R.  If  Ay  and  B.v  are  chosen  at  random ,  then 


■^Tr  (/>.( A,v  +  B..v)<y(  A/v)}  -»  j  y{x)h{y)pA+B(xs  y)dxdy.  (10.2a) 

^Tr(/j(A^JB,vA^J)sf(A/v))  -»  j  g{x)h(y)pAH{x,  y)dxdy,  (10.2b) 

where  the  convergence  is  in  probability  as  N  — *■  oo  and  pA+B  and  pad  are  bivariate 
probability  density  functions  on  M2  that  can  be  decomposed  as 


Pa+b{x,  y)  -  kA+B\A  (*,  y)f.\  (*)  (10.3a) 

PabOw)  =  kAB\ A{x,y)fA(x),  (10.3b) 

when:  /a  :=  dFA(x)  is  the  limiting  eigenvalue  density  function  of  A.\  and  kA+n{x,y) 
and  k^B(xiV)  aTC  Markov  transition  kernel  density  functions. 


PROOF.  This  result  appears  in  Biane  [15]  in  the  context  of  processes  with  free  incre¬ 
ments*  The  connection  with  eigenvectors  is  mentioned  in  [16,  pp.  70].  □ 


The  Markov  transition  kernels  obtained  may  be  intuitively  thought,  of  as  t  he  limit 
of  the  bistochastic  matrix  |{v/,uJ}|2  that  appears  on  the  right  hand  side  of  (10.1).  The 
propositions  that,  follow  describe  the  procedure  for  computing  these  Markov  transition 
kernels. 


Proposition  10.12,  Let  kA+f^A(x,y)  be  the  Markov  transition  kernel  density  function 
as  defined  in  Proposition  10.1 1.  Then  y)  is  the  probability  density  function 

on  R  x  R  with  support  S,\  x  Sab  associated  with  the  analytic  function  q  defined  on 
C“  \  R*  Both  are  uniquely  determined  by  the  relations 

GA+B\A0,y)=  j  ~I—kAJrB[A{x,z)dz  (10.4a) 

./  y  —  * 
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GA+B\A(x,y)=  ,  4  (10.4b) 

q(y)  -  a: 

!7.4(</(y))  =  92l+B{y)  (10.4c) 

for  ail  z  €  \  R. 

Proof.  These  relations  are  derived  in  [15,  pp.  151-153],  □ 

Proposition  10.13,  Lei  be  the  Markov  transition  kernel  as  defined  in 

Proposition  10.11.  Then  kAB\A  {T'V)  i$  the  probability  density  function  on  RxR  with 
support  SA  x  Sab  associated  with  the  analytic  function  q  defined  on  C“  \  R.  Both  an: 
uniquely  determined  by  the  relations 


AH\a{-1'  V)  —  ,  _^AB\a{X' 

J  y  ~  z 

(10.5a) 

Gab'Ax'V)=  V-<,Wv)xy 

(10.5b) 

^maAL(m)=ySABiv) 

(10.5c) 

for  all  z  G  C  \  R, 

Proof.  These  relations  are  derived  in  [15,  pp,  158].  □ 


■  10.2  Algebraic  conditional  “eigenvector  distributions” 

A  closer  inspection  of  the  analytical  procedures,  described  in  Propositions  10.12  and 
10.13,  for  computing  the  Markov  transition  kernels  kA^B\A  aU(l  kAB\A  reveals  the  diffi¬ 
culty  of  concretely  computing  these  kernels.  Specifically,  the  conditions  in  (10.4c)  and 
(10.5c)  will  lie  satisfied  by  a  function  q  that  can  be  expressed  in  closed  form  in  only 
some  special  cases.  However,  when  the  probability  density  functions  fA  and  fA+B  (or 
fAs)  are  algebraic  so  that  we  can  encode  their  Stieltjes  (or  Cauchy)  transform  as  a  so¬ 
lution  of  a  bivariate  polynomial  equation,  the  Markov  transition  kernels  can  be  readily 
computed. 

Remark  10.21  (Terminology),  We  shall  often  informally  use  the  phrase  conditional 
“eigenvector  distribution”  when  referring  to  the  Markov  transition  kernel  that  emerges 
from  Proposition  10.1  L  The  phrase  “ eignvector  distribution tf  is  enclosed  in  quotes  be- 
cause  the  kernel  characterization  is  not  a  distribution  in  the  usual  sense  of  the  word , 
i.e.,  it  does  not  describe  the  probability  distribution  of  the  eigenvectors  of  A  n  +  B  ^ .  It 
is  qualified  by  affixing  the  label  conditional  because ,  in  the  sense  of  Proposition  10 A  U 
it  encodes  how  the  eigenvectors  of  A /v  +  B/v  (or  are  related  to  the  eigen¬ 

vectors  of  AjV. 
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Notation  10.22  (Trivariate  polynomial).  Let  Luv w  denote  a  trivariak  polynomial 
of  degree  in  u,  Dv  in  v  and  in  w  given  by 


Dv 


i=0  k=0 


The  scalar  coefficients  are  real  valued. 

Definition  10.23  (Algebraic  Markov  transition  kernels).  Let  k(a\y)  be  a  Markov 
transition  kernel  density  function.  Consider  the  analytic  function  G(x,y)  defined  on 
C~  \  K  x  CT  \  R  as 

G{x,y)=  [  — —  k{x,z)dz.  (10.7) 

J  y-z 

If  there  exists  a  bivariate  polynomial  LaXy  such  that  LQxy(G(x,  y},:r,y)  —  0  then  we 
refer  to  as  k(x.y)  as  an  algebraic  Markov  transition  kernel  and  say  that  k{x,y)  €  /C31  jv. 
Here  £a|g  denotes  the  class  of  algebraic  Markov  transition  kernels . 

Remark  10.24  (Equivalent  representation).  Let  k(x,  y)  be  a  Markov  transition 
kernel  density  function .  Consider  the  analytic  function  M(x.y)  defined  on  C4  \  R  x 
C4  \  R  as 


(](),*) 


The  function  M(z,  y)  is  related  to  G(x.y).  defined  in  (10.7).  as  M(x.  y)  —  — G{x.  y ). 
If  k(x ,  y)  €  /Ca ijr  then  L^xy  exists  so  that  L^xy  exists  and  is  given  by 


L\\  xy  (  A/|  £ , y )  —  ^Gxy  (  A/  i  V  )  * 


(10.9) 


Remark  10.25  (Property  of  Markov  transition  kernels).  Let  k(x.  y)  be.  a  Markov 
transition  kernel  with  support  Sx  x  Sy.  Then ,  by  definition ,  for  every  xo  €  S:r.  k(x o,  y) 
is  a  positive  probability  density  function  on  Sy  and  M(xq,v)  is  its  Stieltjes  transform. 
Similarly,  for  every  y$  c  Sv,  fe(x,  yo)  a  positive  probability  density  function  with 
support  on  ST  and  M(x,  ya)  is  its  Stieltjes  transform . 

Remark  10.26  (Property  of  algebraic  Markov  transition  kernels).  If  k(x.  y)  € 
A^ajg  with  support.  ST  x  $y  then  for  every  xq  C  Ss  and  go  C  S!r  it  follows  that  k(x o,  y)  € 
Vaig  and  fc(x,  y0)  €  Val[r 

The  main  result,  stated  below,  is  that  the  Markov  transition  kernel  that  emerges 
when  characterizing  the  eigenvectors  of  the  sum  and  product  of  asymptotically  free  al¬ 
gebraic  random  matrices  is  algebraic  as  well.  The  value  of  this  statement  is  that  when 
combined  with  Remark  10.26  it  allows  ns  to  concretely  compute  the  Markov  transition 
kernel  numerically  using  the  techniques  discussed  in  Chapter  8. 
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Theorem  10.27.  Let  A.y  and  B^'  be  asymptotically  free  random  matrices  that  satisfy 
the  hypothesis  in  Proposition  8-41-  If  /a  €  Patg  an<I  fa  €  "Paig  ond  Ica+b\a  w  11  Markov 
transition  kernel  as  defined  in  Proposition  10.11  then  k.4+e|,4  €  X.'alg- 


Proof.  Since  /. 4  G  P8|s  and  fn  €  Paig,  from  Corollary  8.42,  }a+B  €  Pnlg-  Hence 
L£ \f  and  L^+li  exist.  Equation  (10.4c)  implies  that  the  set  of  polynomial  equations 
L^,(g,q)  —  0  and  L^y+  B{g,y)  =  0  share  a  common  solution.  Hence  the  resultant,  given 
by  Definition  6.41,  of  the  polynomials  will  equal  zero,  i.e.,  LAl.fl^A(q,  y)  —  0  where 

1 4+BlA(?<y)  =  Resg  {^M,L&+B(9,  y))  ■  (10.10) 


Equation  (10.4t>)  yields  the  relationship 


<i(y)  =  *■  + 


1 

0{x ,  y)  ‘ 


(10.11) 


Thus  G(x,y)  is  a  solution  of  the  trivariate  polynomial  equation 


where 


ta+b\a 

^Cxy 


(G,  x,  y) 


0 


(10.12) 


is  the  polynomial  obtained  by  clearing  the  denominator  or,  equivalently,  multiplying 
the  right  hand  side  by  Gp«  where  A|  is  the  degree  of  q  in  The  trivariate 

polynomial  tlius  obtained  proves  that  y)  e  /Caig.  □ 


Theorem  10.28.  Let  A,v  and  B ^  be  asymptotically  free  random  matrices  that  satisfy 
the  hypothesis  in  Proposition  8.4 1.  If  f  a  €  Vaig  and  fs  €  V,ug  and  is  a  Markov 

transition  kernel  as  defined  in  Proposition  10.11  then  k‘AB\A  e  ^"alg  ■ 


PROOF.  Since  Ja  €  Paig  and  fn  6  Paig,  from  Corollary  8.43,  /ab  €  Palg-  Hence 
Lg,.  and  L£b  exist.  The  function  <j(y)  :=  7(1  /y)  given  by  the  relation  (10.5c)  is  an 

algebraic  function,  i.e.,  it  satisfies  the  algebraic  equation  LAJ’  \q,y)  =  0.  The  bivariate 
polynomial  L^>]A  is  obtained  as  follows.  First  we  obtain  the  bivariate  polynomial 
given  by 

£gq(ff,7)  =  q D‘L^(qg,  \/q).  (10.13) 

where  Df  is  the  degree  of  z  in  the  polynomial  LgZ.  Equation  (10.5c)  implies  that  the 
set  of  polynomial  equations  L^(g,q)  —  0  and  L^B(gy,y)  =  0  share  a  common  solution. 
Hence  the  resultant,  given  by  Definition  6.41,  of  the  polynomials  will  equal  zero,  i.e., 
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=  0  wliere 

L^‘ilA(q,y)  =  Res3  ( {g,q),  y D*  "  L  ™ (gy ,  y ) ) 


(10.14) 


and  D£b  is  the  degree  g  in  the  polynomial  L^f.  Equation  (10.5b)  yields  1  he  relationship 


'7 {y)  =  h{\fy)  -  -  - 


x  xyG(x,yy 


(10.15) 


Hence,  G(x,y)  satisfies  the  trivariate  polynomial  equation  L*l*A(G.T,  y)  ~  0  where 


L™lA{G,x,y} 


(10.16) 


is  the  polynomial  obtained  by  clearing  the  denominator  of  the  rational  function  ob¬ 
tained,  or  equivalently  multiplying  the  right  hand  side  by  (Gxy)D*  where  D ^  is  the 
degree  of  q  in  The  trivariate  polynomial  thus  be  obtained  proves  that 

(#•!/)  €  £&ig*  D 


Corollary  10.29,  Let  A\-  and  B  y  be,  asymptotically  fjv,e  algebraic  random  matrices. 
Then 

*  kA+B\B  ^  ^iilg  an(l  kA+B\B  €  /Caig, 

•  ^ab\b  €  /Caifr  and  kA&\ B  €  ICn\g. 


PROOF.  This  first  part  of  the  statement  follows  directly  from  Theorem  10.27.  'The  ker¬ 
nels  and  kA+B \a  will  generically  be  different  unless  fA  —  fs  almost  everywhere. 

The  second  part  of  the  statement  directly  from  Theorem  10.28.  The  kernels  kA^\B  a>id 
kAB\A  wdl  generically  be  different  unless  fA  =  f&  almost  everywhere.  □ 

The  proofs  of  Theorems  10.27  and  10.28  reveal  the  symbolic  code  need  to  compute 
the  trivariate  polynomial  that  encodes  the  Markov  transition  kernel.  These  are  listed 
in  Table  10.1.  This  allow  us  to  identify  the  subclass  of  algebraic  random  matrices  for 
which  the  conditional  “eigeneveetor  distribution”  is  algebraic,  in  the  sense  of  having 
algebraic  Markov  transition  kernels. 


Theorem  1 0.210.  Sums  and  (admissible)  products  of  asymptotically  free  algebraic  ran- 
dom  matrices  have  conditioned  "eigenvector  distributions”  that  an  encoded  by  algebraic 
Markov  transition  kernels. 
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Matlab  Code 

function  [LmxyApB , LmzApB]  =  AplusBkernel (LmzA , LmzB) 

syms  m  g  q  x  y  z 

LgzA  *  Lmz2Lgz(LmzA) ; 

LmzApB  =  AplusBfLmzA ,  LmzB)  ; 

LgzApB  =  Lmz  2  Lgz (LmzApB) ; 

LgqA  =  subs ( LgzA ,z ,q) ; 

LgyApB  =  subs(Lg2,z,y) ; 

LqyApB  =  maple ( J resultant J , LgqA ,LgyApB ,g) ■ 

LgxyApB  =  subs (LqyApB , q , x+ 1 /g) ; 

LgxyApB  =  irreducLuvCLgxyApB.g^)  ; 

LmxyApB  =  subs (LgxyApB ,g , -m) ; 


(a>  C,„/- 


B 

mi 


for  A.  B  i — *  A  +  QBQ'. 


Matlab  Code 

function  [LmxyAtb , LmzAtb]  =  At imesBkernel (LmzA , LmzB) 

syms  m  g  q  x  y  z 

LgzA  =  Lmz2Lgz{LmzA) \ 

LmzAtb  “  AtimesB (LmzA , LmzB) ; 

LgzAtB  =  Lmz2Lgz (LmzAtb) ; 

LgqA  =  irreducLuv (subs (LgzA, (g,z), (g*q, 1/q}) ,g,q) ; 
LgyAtB  =  irxeducLuvCsubsCLgzAtB, {g,z},{g/y ,y|) ,g,y) ; 

LqyAtB  =  maple { 1  resultant J , LgqA , LgyAtB ,g) ; 

LgxyAtB  =  subs (LqyAtB tq, 1/x-l/ (y*g*x) ) ; 

LgxyAtb  =  irreducLuv (LgxyAtB ,g,y) ; 

LmxyAtB  =  subs(LgxyAtB ,g t -m) ; 


(b)  —►  L^a  for  A,  B  ■ — •  A  x  QBQ'. 


Table  10. I*  Symbolic  code  in  Matlab  for  computing  the  tri variate  polynomial  that  encodes  the 
Markov  transition  kernel  that  characterizes  the  conditional  “eigenvector  distribution”  of  sums  and 
products  of  algebraic  random  matrices. 
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■  10.3  Example 

Suppose  A/v  is  the  N  x  N  matrix 


A/v  — 


fi  IjV/2 

0 


0 

IjV/2. 


(10.17) 


where  we  have  assumed  without,  loss  of  generality,  for  the  eigenvector  discussion  to 
follow,  that.  A,v  is  a  diagonal  matrix.  There  are  two  eigenspaces  associated  with  this 
matrix.  We  think  of  the  eigenspace  associated  with  the  eigenvalue  equal  to  5  as  being 
the  “signal”  subspace.  Consider  the  Wishart  matrix  constructed  from  an  N  x  L  random 
matrix  G.v  with  standard  normal  entries  as 

W'V  =  ^Ga'GJJ.  (10.18) 

We  employ  the  machinery  developed  to  describe  the  eigenvectors  of  the  matrix 


C,m  —  A.v  +  e  Wat. 


(10.19) 


as  a  function  of  e  (a  mnemonic  for  t)  in  the  N  — *  oo  limit  when  N  =  2L.  Note  that  this 
choice  of  N  and  L  makes  W.v  singular  with  rank  L  (with  high  probability).  Both  A\ 
and  W.v  are  algebraic  random  matrices.  The  limiting  eigenvalue  distribution  function 
of  A.v  has  Stieltjes  transform 


which  is  the  solution  of  the  equation  Lj^7(m,  z)  —  0  where 

Li£(m,z)  =  m(5  -  *)(1  -  z)  -  (3  -  z).  (10.20) 

The  limiting  eigenvalue  distribution  function  of  W#  has  Stieltjes  transform  that  is  the 
solution  of  the  equation  —  0  where 

tUzK  £)  =  2zm2  +  {1  +  z)m  +  1,  (10/21 ) 

is  obtained  by  plugging  c  —  N/L  —  2  into  the  appropriate  polynomial  in  Table  6.2(b). 
Let  Byv  =  e  W.v*  By  Corollary  7.32,  Bv  is  algebraic.  Since  Wy  has  Haar  distributed 
eigenvectors,  it  is  orthogonally  invariant.  This  makes  By  is  asymptotically  free  with 
respect  to  A,v  so  that  by  Corollary  7.56  and  Theorem  10.210,  Cy  =  A  v  T  B  v  has  an 
algebraic  conditional  “eigenvector  distribution.1’ 

To  predict  the  distortion  in  the  “signal"  subspace  of  Av  when  additively  perturbed 
by  B  v  we  compute  kA+B\A(x*y)  ail(l  evaluate  the  Markov  transition  kernel  density 
function  at  x  —  1  and  x  =  5. 
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Using  the  Matlab  tools  developed,  the  trivariate  polynomial  that  encodes  the 
Markov  transition  kernel  is  computed  using  the  sequence  of  commands* 

>>  syms  m  x  y  z  e 

>>  LmzA  =  m  -  0.5/(5-z)  *  0.5/(l-z); 

>>  LmzA  =  irreducLuv tLmzAjm , z) j 
>>  LmzW  =  2*z*m‘2+(l+z)*m+l ; 

>>  LmzB  -  scaleA(LmzW,e) ; 

>>  [LmxyC , LmzC]  =  AplusBkernel (LmzA , LmzB) ; 

from  which  we  obtain  the  bivariate  polynomial 

L£z(m,  z)  =  (20  e2  4-  4  eV  -  24  e2z)  m3  +  (-24  ze  +  20  e  +  4ez2)  m2 

4-  (5  +  z2  —  e2  —  6  z  +  2  ze  —  6#)  m  —  3  *  e  -f  z  (10-22) 

and  the  trivariate  polynomial 

^Mxy  (m*  y)  —  {”5  X  —  x3  —  5  e  +  6  x2  +  6  ye  —  6  zx  +  ex2  +  yx2  +  5  y  —  2  yex)  m3 

+  (6  y  4-  2  ye  —  2  yx  +  3  x2  +  5  —  12.x  —  2  ex)  m2  +  (6  +  e  +  y  -  3  x)  m  +1  (10-23) 

Figure  10-1  compares  the  density  function  associated  with  limiting  eigenvalue  distri¬ 
bution  of  C;v  for  different  values  of  e.  These  curves  were  computed  using  the  techniques 
described  in  Section  8.2  from  the  bivariate  polynomial  Lj^z,  The  curves  reveal  the  ex¬ 
tent,  of  the  distortion  in  the  eigen-spectrum  of  A  ,v  induced  by  the  low  rank  perturbation 
B;v  —  eWjv.  As  e  — *  0,  the  distortion  lessens  and  the  limiting  eigenvalue  distribution 
of  C/v  will  resemble  that  of  A  at- 

The  Stieltjes  transform  of  the  density  function  k(  L  y)  is  the  solution  of  the  algebraic* 
equation  L^y{m,  1,  y)  =  0,  i.e., 

(—4  c  +  4  ye)  m3  +  (4  y  +  2  ye  —  4  —  2e)  ro2  +  (3  +  e  +  y)  m  +1=0  (10.24) 

We  can  compute  the  density  and  the  moments  of  fc(l,y)  from  (10.24)  by  using  the 
techniques  described  in  Chapter  8,  Its  first  4  moments  are,  respectively 

1  +  e 

1  +  2e  +  3e2 
1  +  3e+  13  e2  +  11  e3 
1  +  4  e  +  70  eA  +  50  e2  +  45  e4 

Similarly,  the  Stieltjes  transform  of  the  density  function  k{h,y)  is  a  solution  of  the 
Ct  A 

algebraic  equation  LMxv(m,  fvy)  —  0, 

(20 e  -  4 ye)  m3  +  (-4 y  +  2 ye  +  20  -  10 e)  m2  +  (-9  +e  +  y)m  +  1  =  0.  ( 10.25) 
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X 

Figure  10-1.  Limiting  eigenvalue  density  function  of  A/v  +  e  W,v  for  different  values  o  ft 


We  can  similarly  recover  the  density  function  y)  and  its  moments  from  (10.25). 
From  Proposition  10.11,  the  relationship  between  the  limiting  eigenvalue  distribution 
of  C  =  A  +  B  and  the  kernel  density  functions  fr(l,  y)  and  k{h,  y)  can  be  deduced.  In 
general,  for  k(x,y)  =  ^a^b\a(x^  V)  the  relationship 

dFA+B{y)  =  J  k(x,y)dFA{x),  (10.26) 

reduces,  in  our  case  to 

dFA+B  =  o.5fc(l,  y)  +  0.5  fc(5,  y).  (10.27) 

Figure  10-2(a)  illustrates  the  relationship  in  (10.27).  Figure  10-2(1))  compares  the  func¬ 
tion  0.5fc(5,r)  with  the  weighted  empirical  histogram  of  the  eigenvalues  of  C  .y,  collected 
over  4000  trails  with  N  =  100  =  2 L.  The  weight  used  to  compute  the  histogram  is 
the  norm  square  of  the  projection  of  the  each  eigenvector  of  onto  the  “signaT  suIf 
space,  for  A/v  given  by  (10.17),  the  N  x  N/2  projection  matrix  with  ones  along 
the  diagonal  and  zeros  elsewhere. 

Figure  10-2(a)  provides  insight  into  the  how  the  eigenvectors  of  the  A  +  2  W  are  re¬ 
lated  to  the  eigenvectors  of  A.  For  a  large  enough  A.  suppose  we  obtain  an  eigenvalue 
of  magnitude  approximately  4.25  (where  the  curve  representing  0..5fc(5,  *)  intersects 
with  the  curve  representing  0.5 fc(l,  *)).  What  Figure  10-2(a)  conveys  is  that  the  com  - 
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sponding  eigenvector  of  A  +  2  W  will  have  a  projection  of  equal  noun  onto  each  of  the 
eigenspaces  of  A, 

In  other  words,  conditioned  on  an  eigenvalue  of  A  +  2W  having  magnitude  z,  the 
projection  of  the  corresponding  eigenvector  onto  the  eigenspace  of  A  spanned  by  the 
eigenvalue  of  magnitude  5  will  have  square  norm  that  will  be  very  well  approximated 
(for  large  N)  by  the  expression 

0,5  fc(5,  z) 

0.5Jfe(l,z)  +  0.5Jfc(5,z)’ 

Figure  10~3(a)  compares  this  expression  for  values  of  z  in  the  support  of  the  limiting 
distribution  (shown  in  Figure  10*3(h))  with  the  norm  square  of  the  projection  of  the 
sample  eigenvectors  of  a  single  realization  of  C/v  formed  with  N  =  100  —  2 L  onto  the 
subspace  spanned  by  the  eigenvalues  of  magnitude  5  in  A  in  (10,17),  It  is  clear  that 
despite  the  predictions  being  asymptotic  in  nature,  they  accurately  predict  the  behavior 
for  finite  sized  matrices  as  well 

The  experiments  and  the  theory  capture  many  interesting  features  about  the  be¬ 
havior  of  sample  eigenvectors: 

1.  If  wo  consider  the  square  norm  of  the  projection  onto  each  of  the  subspaces  to 
be  a  “reliability  metric,”  then  it  is  immediately  apparent  that  all  the  sample 
eigenvectors  are  not  equally  reliable, 

2.  The  behavior  of  the  eigenvectors  corresponding  to  the  smallest  and  the  largest, 
eigenvectors  is  very  different.  In  fact,  the  middle  eigenvectors  have  the  greatest 
projection  on  the  eigenspace  of  A  spanned  by  the  eigenvalue  of  magnitude  5. 

Taken  together,  the  computational  tools  developed  allow  us  to  use  the  machinery 
of  fret1  probability  to  analytically  describe  the  deterioration  in  the  “reliability*  of  the 
sample  eigenvectors  induced  by  the  additive  random  moderate  rank  subspace  perturba¬ 
tion,  Tools  from  “classical”  perturbation  theory  [96]  would  have  been  inadequate  in  the 
scenario  considered  because  the  matrix  norm  of  B  is  comparable  to  the  matrix  norm 
of  A  and  the  perturbation  in  question  is  not,  by  any  stretch  of  the  imagination,  of  low 
rank. 

Figure  10.3  plots  the  kernel  density  function  The  fact  that  the  kernel 

density  function  is  a  true  bivariate  probability  density  function  over  K2  follows  from 
Remark  10.25, 
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(a)  The  density  function  dFAiBm  and  the  scaled  kernel  density  functions  b\  a  ( 1  ■ ')  and 

0 .&kA+&\A (5,  ■). 


(h)  Empirical  validation  of  the  theoretical  scaled  kernel  density  function  0.5A\j  ( £|  4  ($,•}, 

Figure  10-2.  The  composite  limit  eigenvalue  density  function  is  interpreted  as  the  sum  of  the  scaled 
individual  kernel  density  functions. 
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(a)  Norm  square  of  the  projection  of  the  eigenvectors  of  CV  onto  the  signal  subspace  of  An. 
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Figure  10-4.  The  Markov  transition  kernel  density  kAi b|B  where  B  —  2  W 
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■  10.4  Algebraic  empirical  covariance  matrices 

We  conclude  by  applying  this  machinery  to  predict  the  deterioration  of  the  eigenvectors 
of  empirical  covariance  matrices  due  to  sample  size  constraints.  The  (broader)  class  of 
algebraic  Wishart  sample  covariance  matrices  for  which  this  framework  applies  is  de¬ 
scribed  next. 


Theorem  10.41.  Let  An  A  6  Maiy,  and  B,v  B  €  Maig  be  algebraic  covariance 
matrices  with  Gn,/v  denoting  an  n  x  N  (pure)  Gaussian  random  matrix  (see  Definition 

7 .41).  Let  X„,*  =  Then 

—  Xnh]\jXn  N  3  *  S  G  AA uig  and  i/)  ^  ^Caig* 

as  n,  N  — *  oc  and  cjm  —  n/N  — *  c. 


Proof.  Let  Yn,jv  =  G„jvB^2,  T„  =  Yn,^YJtiA,  and  TN  =  YJ,  iVY„,,v.  Thus 

S„  =  Au  x  Trl  =  A,V~Tn A,1,' The  matrix  T„,  as  defined,  is  invariant  under  orthog- 
onal/unitary  transformations,  though  the  matrix  Ty  is  not.  Hence,  by  Corollary  7.56, 
and  since  An  A  E  Ma ig,  Sn  *-+  S  €  M& ig  whenever  Tn  i-*  T  G  From  Theorem 

7.36,  T„  i — *  7  €  -Ma|g  if  T y  *— *  T  6  A4a|g.  The  matrix  Ty  —  By*  G^yGn>yBjy* 
is  dearly  algebraic  by  application  of  Corollary  7.56  and  Theorem  7.31  since  B  y  is 
algebraic  and  yGn,y  is  algebraic  and  unitarily  invariant. 

From  Theorem  10.210,  since  T„  and  An  are  algebraic,  the  conditional  “eigenvector 
distribution11  of  Sri  -  Ari  x  Tri  is  algebraic.  This  proves  that  k$\ ^(ar,  j/)  G  □ 

The  theorem  can  be  restated  more  succinctly. 


Corollary  10.42.  Algebraic  sample  covariance  matrices  with  Wishart  distribution  have 
limiting  eigenvalue  and  conditional  “ eigenvector  distributions”  that  are  algebraic . 


In  high-dimensional  inference  applications,  n  is  often  interpreted  as  the  number 
of  variables  (spatial  dimension}  while  N  is  the  number  of  measurements  (temporal 
dimension).  The  matrices  An  and  By  then  model  the  spatial  and  temporal  covariance 
structure  of  the  collected  data.  The  parameter  cy  —  n/N  is  the  ratio  of  the  number 
of  variables  to  the  number  of  measurements.  In  a  sample  size  constrained  setting,  we 
expect  cy  to  be  significantly  greater  than  zero. 

The  proof  of  Theorem  10.41  provides  us  with  a  recipe  for  computing  the  polynomials 
that  encode  the  limiting  eigenvalue  and  conditional  eigenvector  distributions  of  S  in  the 
far  more  general  situation  where  the  observation  vectors  are  modelled  as  samples  of  a 
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Matlab  Code 

function  [LmxyS  ,LinzS]  =  AtimesWishtimesBCLmzA  ,  LmzB  ( c) 
syms  m  x  y  z 

LmzW  =  c*z*nT2-(l~c-z) *m+l ; 

LmzWt  =  transposeACLmzW, c) ; 

LmzT  =  At imesBC LmzWt , LmzB) ; 

LmzTt  -  transposeA(LmzT, 1/c) ; 

[LmxyS , LmzS]  =  AtimesBkernel(LmzA , LmzTt) ; 

Table  10,2,  Symbolic  code  for  computing  the  bivariate  and  trivariate  polynomials  which,  respectively, 
encode  the  limiting  conditional  eigenvector  and  eigenvalue  distribution  of  algebraic  empirical  covariance 
matrices. 


multivariate  Gaussian  with  spatio-temporal  correlations.  The  limiting  eigenvalue  and 
eigenvector  dist  ribution  of  S  depends  on  the  limiting  (algebraic)  eigenvalue  distributions 
of  A  and  B.  The  symbolic  code  for  computing  these  polynomials  is  listed  in  table  10,2, 
When  there  are  no  temporal  correlations,  z.e,,  B  —  L  then  we  set  L^v/  —  m(  1  —  z)  —  1  in 
the  computations  and  proceed  to  extract  the  density  arid  moments  from  Lfnz  ius  usual. 
Note  the  dependence  on  the  limiting  value  of  the  ratio  c  lim  c#. 

Thus  the  methods  developed  allow  t  he  practitioner  to  analytically  predict  the  quality 
of  the  eigenvectors  of  S  relative  to  the  eigenvectors  of  the  (spatial)  covariance  matrix 
A  for  c  €  {0,0c),  This  provides  a  window  into  how  sample  size  constraints  affects  the 
estimation  of  the  eigenvectors  in  high-dimensional  settings. 


■  10.4.1  Example 

Consider  the  sample  covariance  matrix  S  formed  as  in  Theorem  10,41.  Assume  that 
An  anti  Baf  have  the  same  the  limiting  eigenvalue  distribution  function  given  by 


F/l(x)  =  Fd(x)  =  0.5  Ijj.oo,  +  0.5 1,2,00)  • 

The  Stieltjcs  transform  of  the  limiting  eigenvalue  distribution  function  is 

.  ,  .  .  0.5  0.5 

mA(z)  =  mB(z)  =  - - -  + 


( 10.28) 


2-z  1  -  s’ 

which  satisfies  the  polynomial  equation  L^]Z(m,  z)  =  z)  —  0  where 

LL  =  AS.  =  (-Gz  +  2z2  +  4)m  +  2z- 3. 
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Using  the  symbolic  tools  developed,  we  can  obtain  the  polynomials  that  encode  the 
limiting  eigenvalue  and  eigenvector  distribution  from  the  sequence  of  commands: 

>>  syms  m  x  y  z  c 

»  LmzA  =  m  -  0 . 5/ (2-z)  *  0 ■ 5/ ( l~z) ; 

»  LmzA  “  irreducLtiv(LmzA  ,m,z)  ; 

»  LmzB  =  LmzA; 

»  [LmxyS,Lmz$]  =  At imesWishtimesB (LmzA , LmzB ,e) ; 
from  which  we  obtain  the  bivariate  polynomial 

fe=i 

where 


-18  c  +  18  c2 

18c  —  9 

4 

0 

-108  c2 +3  6c  +  72  c3 

—  1 1 2  c  +  18+  130c2 

-18  + 54  c 

4 

64  c2  -f-  64  c4  —  128  cd 

72  c  —  324  c2  +  288  c3 

224  c2-  112c 

36  c 

0 

64  c2  -  256  c3  +  192  c'1 

360  c3  -  216  c2 

112  c2 

0 

0 

192  c4  -  128c3 

1 44  c:t 

0 

0 

0 

64  c4 

Using  the  sequence  of  commands  described  in  Section  8.3,  we  obtain  the  first  four  terms* 
parameterized  by  c,  of  the  moment  generating  function: 


Mr(*)  —  1  +  4- 

4 


1377\  _4 
32  ;* 


+  0(4-5}. 


Note  how  t  lie  moments  explicitly  capture  the  impact  of  the  coil  the  limiting  distribution. 
This  is  remarkable,  since,  for  this  particular  example,  it  is  just  not  possible  to  express 
the  density  function  in  closed  form. 

Figure*  10-5{a)  plots  the  limiting  eigenvalue  density  function  of  S„  for  different  values 
of  c.  Note  the  convergence  of  the  distribution,  as  c  — ■»  0,  to  an  atomic  distribution 
with  two  equally  weighted  atoms  at  1.5  and  3.  Figure  10-5(b)  compares  theory  with 
experiment  for  c  =  0.25. 

The  trivariate  polynomial  LjJj  is  too  messy  to  print.  The  kernel  density  functions 
A's  1,4(2.  y)  and  ^5^4(1,  y)  have  Stieltjes  transforms  which  are  solutions  of  the  equation 

TmiCrn,  *)  =  0  where 


J=1  k= 1 


jk 


m3~lzk~1 
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for  i  —  1,2  with 


8 

0 

0 

0 

-24  + 48  c 

—8 

0 

0 

rp.S'|2  _ 

64  c2  -64  c  48 

—  24  c 

-8 

0 

mas 

0 

96  c 

-24  -  48c 

8 

0 

0 

-32  c -48  c2 

24  c 

0 

0 

0 

l(i  c2 

8 

0 

0 

0 

-12  + 24c 

16 

0 

0 

t*i  i 

-16c  +  16c2 

42  c- 

12  10 

0 

T^l 1  — 

x  ttJJE  ” 

0 

24  c2  - 

12  c  21c- 3 

2 

0 

0 

9c2 -2c 

3  c 

0 

0 

0 

c2  ^ 

Figure  10-6(a)  compares  experiment  with  theory,  over  values  of  z  in  the  support  of 
the  limiting  distribution  (shown  in  Figure  106(b))  for  the  norm  square  of  the  projection 
of  the  sample  eigenvectors  of  a  single  realization  of  Sn  formed  with  Ar  =  400  =  4//  onto 
the  subspace  spanned  by  the  eigenvalues  of  value  2  in  A.  It  is  clear  that  despite  the 
predictions  being  asymptotic  in  nature,  they  accurately  predict  the  behavior  for  finite 
sized  matrices  as  well, 

■  10.5  Future  work 

The  ability  of  predict  the  deterioration  in  the  quality  of  the  eigenvectors  of  algebraic 
empirical  covariance  matrices  due  to  sample  size  constraints  raises  the  possibility  of 
whether  the  results  can  be  used  to  formulate  new  high-dimensional  covariance  matrix 
estimation  algorithms.  Covariance  estimation  from  frequentist  and  Bayesian  perspec¬ 
tives  is  an  established  topic,  e.g,  [42,95,  117];  recently  some  authors  have  begun  to 
address  large  N  questions,  for  example  by  in  effect  using  linear  shrinkage  on  eigenval¬ 
ues  (e.g  [26,58,89]),  with  applications  for  example  in  empirical  finance  [56,57],  There 
has  been  interest  in  structured  covariance  matrix  estimation  for  signal  processing  appli¬ 
cations  as  in  [38,39].  Smith  treats  covariance  matrix  estimation  from  a  geometric  point 
of  view  in  [90].  Combining  the  insights  of  these  various  authors  with  the  analytical 
results  that  capture  the  degradation  in  the  estimated  eigenvectors  offers  a  possibility 
of  attacking  this  problem  from  a  fresh  perspective,  A  wide  open  problem  is  the  under¬ 
standing  the  nature  of  the  fluctuations,  for  both  the  eigenvalues  and  eigenvectors,  for  a 
broader  class  of  random  matrice,  Related  questions  include  characterizing  the  rate  of 
convergence. 
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(a)  The  density  of  the  limiting  eigenvalue  distribution  funct  ion  oFSn  for  different  values 
of  c .  When  c  —  0,003  it  means  that  there  are  roughly  1000  Limes  as  many  temporal 
measurements  as  there  are  spatial  observations  and  so  on. 


(b)  The  theoretical  limiting  density  function  (solid  line)  for  c  =  0,25  is  compared  with 
the  normalized  histogram  of  the  eigenvalues  of  Sn  collected  over  4000  Monte-Carlo 
trials  with  n  =  100  and  /V  —  400, 


Figure  10-5,  A  Wisharl  random  matrix,  S„ ,  with  spatio-temporal  correlations.  The  spatial  and  the 
temporal  covariance  matrices  have  limiting  eigenvalue  distribution  given  by  (10.28). 
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(a)  Norm  square  of  the  projection  of  ibe  eigenvectors  of  a  single  random  matrix  realizat  ion  onto 
the  eigen  space  of  A  spanned  by  the  eigenvalue  of  value  5, 


(b)  The  theoretical  limiting  density  function  (solid  line)  for  c  =  0*25  is  compart'd  with  tin-  normalized 
histogram  of  the  eigenvalues  of  S„  collected  over  4000  Monte-Carlo  t  rials  with  n  —  100  and  N  =  400. 


Figure  10-6*  A  Wishart  random  matrix,  SPM  wit  h  spatio-temporal  correlations*  The  spatial  and  the 
temporal  covariance  matrices  have  limiting  eigenvalue  distribution  given  by  (10.28). 


Chapter  11 


Afterword 


In  ilit*  first  part  of  this  dissertation,  we  applied  random  matrix  theory  to  inference 
problems  where  the  measurements  were  drawn  from  a  multivariate  normal  distribution. 
By  exploiting  the  properties  of  the  eigenvalues  of  large  dimensional  Wishart  distributed 
random  matrices  we  were  able  to  design  algorithms  that  turned  the  underlying  high- 
dimensionality  into  an  advantage. 

In  tht'  second  part  of  this  dissertation,  we  developed  a  powerful  method  that  allowed 
us  to  characterize  the  eigenvalues  of  a  broad  class  of  random  matrices  well  beyond  the 
special  case  of  matrices  with  Wishart  distribution.  A  natural  question  then  arises:  Can 
these  more  complicated  matrix  models  be  physically  justified  so  that  the  results  can  be 
applied?  The  conclusion  of  this  dissertation  makes  this  the  opportune  moment  for  us 
to  pose  this  question  and  share  some  of  our  thoughts  on  this  matter. 

We  feel  that  an  important  extension  of  the  work  in  this  thesis  is  the  development  of 
random  matrix  models  that  adequately  capture  the  essential  complexities  of  the  real- 
world  high-dimensional  inference  problem  without  being  so  complicated  that  we  cannot 
get.  answers  for  them.  We  anticipate  that  initially  this  will  have  to  be  done  on  an 
appli cat ion-by- application  basis  in  close  collaboration  with  experts  in  the  field  who  can 
ensure  that  aspects  of  the  problem  that  could  affect  the  solution  arc  not  missed. 

Progress  on  this  front  is  likely  to  be  deliberate  because  there  is  an  art  to  model 
building  which  makes  it  difficult  to  rush,  although  like  other  artistic  endeavors,  as  Gil 
Strang  puts  it  [97,  pp.  9],  “people  who  do  it  well  will  agree  when  it  is  done  well." 

Researchers  used  to  producing  differentia]  equations  to  model  their  problem  might 
have  to  work  a  bit  longer  and  squint  a  bit  harder  to  discern  the  random  matrix  that  is 
buried  in  their  problem,  if  at  all.  The  incentive  for  their  effort,  is  that  if  their  random 
matrix  model  fits  into  the  general  framework  developed,  then  the  full  power  of  the 
met  hods  developed  in  this  dissertation  can  be  brought  to  bear  on  their  problems.  The 
blessings  of  high-dimensionality,  in  an  inferential  context,  will  then  fully  manifest. 

Of  course,  we  acknowledge  the  possibility  that  practitioners  experimenting  with  our 
framework  while  model  building  might  discover  that  the  current  theory  does  not.  suffice 
and  that  additional  theory  and  methods  are  needed.  In  concluding  this  dissertation,  we 
adopt  the  position  that  we  would  greet  such  a  contrary  discovery  with  a  healthy  dose 
of  gratification.  After  all.  it  would  be  a  testimony  to  our  belief  that  this  dissertation 
can  be  the  starting  point  for  this  and  other  explorations. 
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AFTERWORD 


Appendix  A 


Random  Matrices  and  Free 
Probability  Theory 


This  material  in  this  appendix  is  based  on  an  exposition  by  Roland  Speidier.  It  has 
been  included  (almost)  verbatim  with  his  permission. 

■  A.l  Moments  of  random  matrices  and  asymptotic  freeness 

What  can  we  sav  about  the  eigenvalue  distribution  of  the  sum  A  +  B  of  the  matrices? 
Of  course,  the  latter  is  not  just  determined  by  the  eigenvalues  of  A  and  the  eigenvalues 
of  B,  but  also  by  the  relation  between  the  eigenspaces  of  A  and  of  B.  Actually,  it 
is  a  quite  hard  problem  (Horn’s  conjecture)  -  which  was  only  solved  recently  to 
characterize  all  possible  eigenvalue  distributions  of  A  +  B.  However,  if  one  is  tusking 
this  question  in  the  context  of  N  x  /V- random  matrices,  then  in  many  situations  the 
answer  becomes  deterministic  in  the  limit  N  —> >  oc. 

Definition  A.  11.  Let  A  -  (A n)n&*  be  a  sequence  of  N  x  N -random  matrices.  Wo 
say  that  A  has  a  limit  eigenvalue  distribution  if  the  limit  of  all  moments 

an:=  lim  £[tr(Aft)]  (n  G  N) 

/V  —*oc 

exists,  where  E  denotes  the  expectation  and  tr  the  normalized  trace* 

Using  the  language  of  limit  eigenvalue  distribution  as  in  Definition  A.ll.  our  ques¬ 
tion  becomes:  Given  two  random  matrix  ensembles  of  N  x  jV-random  matrices,  A  = 
(A/vLygfj  and  B  =  (B,v)jveN,  with  limit  eigenvalue  distribution,  does  their  sum  C  — 
(C/vhve N'  w'th  C,v  =  Ay  +  B;y ,  also  have  a  limit  eigenvalue  distribution,  and  further¬ 
more,  can  wc  calculate  the  limit,  moments  of,  of  C  from  the  limiting  moments  («J^)jt>i 
of  A  and  the  limiting  moments  (ojf  )fc>i  of  B  in  a  deterministic  way.  It  turns  out 
that  this  is  the  case  if  the  two  ensembles  are  in  generic  position,  and  then  the  rule  for 
calculating  the  limit  moments  of  C  are  given  by  Voiculescu's  concept  of  “freeness”  . 

Lemma  A. 12  (Voiculescu  [108]).  Let  A  and  B  be  two  random  matrix  ensembles  of 
N  x  N -random  matrices.  A  =  {A,v)/veN  and  B  =  (Bjv);veN-  each  of  them  with  a  limit 
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eigenvalue  distribution.  Assume  that  A  and  B  are  independent  (i.e.,  for  each  N  €  N. 
all  entries  of  A,y  are  independent  from  all  entries  of  By ),  and  that  at  least  one  of 
them  is  unitarily  invariant,  (i.e.,  for  each  N .  the  joint  distribution  of  the  entries  does 
not  change  if  we  conjugate  the  random  matrix  with  an  arbitrary  unitary  N  x  N  matrix). 
Then  A  and  B  are  asymptotically  free  in  the  sense  of  the  following  definition. 

Definition  A. 13  (Voiculescu  [105]).  Two  random  matrix  ensembles  A  —  (A .\-}.veN 
and  B  —  (By  with  limit  eigenvalue  distributions  are  asymptotic-ally  free  if  we  haw 
for  all  p  >  1  and.  all  integers  rt(l)tm{l), . . . ,  n(p).  m(p)  >  1  that 


tiro  E 

N—*o e* 


tr{ < A^,10  -  ■  •  ■ 

Thus,  for  example  if  A  and  B  are  asymptotically  free  then,  we  necessarily  have 


-  0 


Ajin^E[tr{(AV  -  afl)  •  (B%  -  of  I)  •  (A3  -  d£l)  •  (B'1  -  of I)}] 


where  we  have  inserted  n(l)  —  1.  n(2)  —  3,  m(l)  =  2,  m(2)  =  4  in  Definition  A,  13, 
Embedded  in  the  definition  of  asymptotic  freeness  is  a  rule  which  allows  us  to  calculate 
all  mixed  moments  in  A  and  B,  i.e,,  all  expressions  of  the  form 

lira  £[tr(An(1)Bmtl)Art^Bm(2)  -  An(,,)Bm(w))l 

Ar— ‘oc 


out  of  the  limit  moments  of  A  and  the  limit  moments  of  B.  In  particular,  this  means 
t  hat  all  limit  moments  of  A  H-  B  (which  are  sums  of  mixed  moments)  exist,  thus  A  +  B 
Inis  a  limit  distribution,  and  are  actually  determined  in  terms  of  the  limit  moments  of 
A  and  the  limit  moments  of  B,  The  actual  calculation  rule  is  not  directly  clear  from 
the  above  definition  but  a  basic  result  of  Voiculescu  shows  how  this  can  be  achieved 
by  going  over  from  the  moments  an  to  new  quantities  In  [91]  ,  the  combinatorial 
structure  behind  those  Kn  was  revealed  and  the  name  “free  nmmlants"  was  coined  for 
them. 

Definition  A,  14  (Voiculescu  [106],  Speicher  [91])*  Gwen  the  moments  (a„)„>i  of 
some  distribution  (or  limit  moments  of  some  random  mains  ensemble ),  we  define  tin 
corresponding  free  cum u hints  (Kn)^>i  by  the  following  relation  between  their  generating 
power  series:  If  we  put 


M{x)  :=  1  +  J2 

n>  1 


and.  C(t)  :=  1  +  ^ 
n>  l 


then  we  require  as  a  relation  between  these  formal  power  scries  that 


C{xM(x ))  -  AT(x). 
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Voiculescu  actually  formulated  the  relation  above  in  a  slightly  different  way  using 
the  so-called  ii-transform  7?(;r),  which  is  related  to  C{x)  by  the  relation 

C{x)  =  1  +  zTZ(x) 


and  in  terms  of  the  Cauchy  transform  G(x)  corresponding  to  a  measure  with  moments 
q„,  which  is  related  to  M  (x)  by 


<?(*)  = 


mil 


In  these  terms  the  equation  C(xM(x))  —  M(x)  says  that 

gLj  +  R(0(,)).«, 


(A.l) 


i.e.,  that  G{x)  and  K(x)  :=  ~  +  7l(x)  are  inverses  of  each  other  under  composition. 

One  should  also  note  that  t he  relation  C(xM(x))  =  M(x)  determines  the  moments 
uniquely  in  terms  of  the  cumulants  and  the  other  way  around.  The  relevance  of  the 
Kn  and  the  /?-transform  for  our  problem  comes  from  the  following  result  of  Voiculescu, 
which  provides,  together  with  (A.l),  a  very  efficient  way  for  calculating  eigenvalue 
distributions  of  the  sum  of  asymptotically  free  random  matrices. 

Lemma  A.  15  (Voiculescu  [106]).  Let  A  and  B  be  two  random  matrix  ensembles 
which  are  asymptotically  free.  Denote  by  nA.  kb,  ka+I1  the  free  cumulants  of  A.  B. 
A  +  B ,  respectively.  Then  one  has  for  all  n  >  1  that 


+  H  • 


Alternatively. 

nA+B(x)  =  nA{x)  +  nli{x). 

This  lemma  is  one  reason  for  calling  the  n,n  cumulants  (as  they  linearize  the  'Tree 
convolution"  in  the  same  way  as  the  usual  convolution  is  linearized  by  classical  cumu- 
lants).  b\it  there  is  also  another  justification  for  this,  namely  they  are  also  the  limit  of 
classical  cumulants  of  the  entries  of  our  random  matrix,  in  the  case  that  this  is  unitariiy 
invariant. 

Proposition  A.  16.  Let  A  =  (Ay)ygN  a  unitariiy  invariant  random  matrix  ensem¬ 
ble  of  N  x  JV  random  matrices  Ay  whose  limit  eigenvalue  distribution  exists.  Then 
the  free  cumulants  of  this  matrix  ensemble  can  also  be  expressed  as  the  limit  of  special 
classical  cumulants  of  the  entries  of  the  random  matrices:  If  Ay  =  then 


lim  Nn~l 

Ar— *ac 


cn(a 


IN) 

»d  )i(2) 


(N) 

°i(2)i(3p- 


a 


(JV) 

i(n),i(l) 


) 


for  any  choice  of  distinct  i(n). 
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PROOF.  This  appears  in  Collins,  Mingo,  Sniady,  Speicher  [2ilj.  □ 

■  A. 2  Fluctuations  of  random  matrices  and  asymptotic  second 
order  freeness 

There  are  many  more  refined  questions  about  the  limiting  eigenvalue  distribution  of 
random  matrices.  In  particular,  questions  around  fluctuations  have  received  a  lot  of 
interest  in  the  last  decade  or  so.  The  main  motivation  for  Speicher  and  colleagues  intro¬ 
ducing  the  concept,  of  “second  order  freeness1’  was  to  understand  the  global  fluctuations 
of  the  eigenvalues,  which  means  that  we  look  at  the  probabilistic  behavior  of  traces  of 
powers  of  our  matrices.  The  limiting  eigenvalue  distribution,  as  considered  in  the  last 
section,  gives  us  the  limit  of  the  average  of  this  traces.  However,  one  can  make  more 
refined  statements  about  their  distributions.  Consider  a  random  matrix  A  —  (A n)neN 
and  look  on  the  normalized  traces  tr(A^).  Our  assumption  of  a  limit  eigenvalue  dis¬ 
tribution  means  that  the  limits  a*  liniyv—oo £[tr(A^)]  exist.  It  turned  out  that  in 
many  cases  the  fluctuation  around  this  limit, 


tr{A%)-nk 


is  asymptotically  Gaussian  of  order  1  fN\  i.e.,  the  random  variable 

N  ■  (tr(Ajv)  -  ak)  =  Tt(A%)  -  Nak  =  Tt(A%  -  e»*l) 

(where  Tr  denotes  the  uimormalized  trace)  converges  for  N  — *  oo  to  a  normal  vari¬ 
able,  Actually,  the  whole  family  of  centered  unnormalized  traces  (Tr{AAv)  —  Abv^)jt>] 
converges  to  a  centered  Gaussian  family. 

Note  that  in  Speicher  and  colleagues  theory  the  formulation  is  in  terms  of  complex 
random  matrices;  in  the  case  of  real  random  matrices  there  arc  additional  complications 
which  their  theory  does  not  currently  account  for  but  which  are  likely  be  resolved  in 
future  investigations, 

Tims  the  main  information  about  fluctuations  of  our  considered  ensemble  is  con¬ 
tained  in  the  covariance  matrix  of  the  limiting  Gaussian  family,  i.e.,  in  the  quantities 

Vn-  hm  cov  { Tr  ( A  ™ ,  Tr ( A^ ) ) . 

A  — ■♦qc 

Let  us  emphasize  that  the  an  and  the  om>n  are  actually  limits  of  classical  cumulants 
of  traces;  namely  of  the  expectation  as  first  and  the  variance  as  second  cumulant.. 
Nevertheless,  the  a's  will  behave  and  will  also  be  treated  like  moments;  accordingly  we 
will  call  the  nm,n  'fluctuation  moments1.  We  will  below  define  some  other  quantities 
which  take  the  role  of  cumulants  in  this  context. 

This  kind  of  convergence  to  a  Gaussian  family  was  formalized  in  [64]  by  t  he  notion 
of  “second  order  limit  distribution”  (sec  Definition  4.25), 

Definition  A  .21 ,  Let  A  —  ( A;y)  /ven  be  an  ensemble  of  N  x  jV  random  matrices  Ay- 
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We  say  that  it.  has  a  second  order  limit  distribution  if  for  all  m,  n  >  1  the  limits 

:=  Jim  ci(tr(Aft)) 

/V— 


and 

“  Jim  c2(Tr(Aj!J),Tr(A^)) 

/V  —  OO 

exist  and  if 

Jim  c,.(Tr{A"(1) l>(A”(r)))  =0 

— *OC 

for  all  r  >  3  and  all  n(  1), . .  * ,  n(r)  >  1. 

We  can  now  ask  the  same  kind  of  question  for  the  limit  fluctuations  as  for  the  limit 
moments;  namely,  if  we  have  two  random  matrix  ensembles  A  and  B  and  we  know  the 
second  order  limit  distribution  of  A  and  the  second  order  limit  distribution  of  B,  does 
this  imply  that  we  have  a  second  order  limit  distribution  for  A  +  B.  and,  if  so,  is  there 
an  effective  way  for  calculating  it.  Again,  we  can  only  hope  for  a  positive  solution  to  this 
if  A  and  B  are  in  a  kind  of  generic  position.  As  it  turned  out,  the  same  requirements 
as  before  are  sufficient  for  this*  The  rule  for  calculating  mixed  fluctuations  constitutes 
the  essence  of  the  definition  of  the  concept  of  second  order  freeness* 

Proposition  A. 22.  Let.  A  and  B  be  two  random  matrix  ensembles  of  N  x  N -random 
matrices ,  A  =  (A.yJa^n  and  B  —  (B,y)iVeN?  each  of  them  having  a  second  order  limit 
distribution .  Assume  that  A  and  B  are  independent  and  that  at  least  one  of  them  is 
unitarily  invariant.  Then  A  and  B  arc  asymptotically  five  of  second  order  in  the  senst 
of  the  following  definition. 

Proof.  This  appears  in  Mingo,  Sniady,  Speicher  [63].  □ 

Definition  A. 23  (Mingo,  Speicher  [64])*  Consider  two  mndom  matrix  ensembles 
A  =  (A  ,v)ncn  an d  B  —  (B,\f)jveM^  each  of  them  with  a  second  order  limit  distribution . 
Denote  by 

yy(n(l),m{l) . n(p).m{p}) 

the  random  variable 


Tr((A$”  -  -  < „1)  •  (A”«  - 

The  mndom  matrices  A  =  (A.vJweN  and  B  —  (B/JyveK  on;  asymptotically  free  of 
second  order  if  for  all  n,  rn  >  1 

Jim  c2(Tr{A^  -  aJ?l),Tr(Bfl  -  afn 1))  =  0 

N—*oc 


and.  for  all  p,q  >  1  and  ?i{  1 ) , . . . ,  n(p),m(l}, . . . ,  m(p),n(l) - -  h{q),  m(  1), . . . ,  771(9}  — 

1  we  have 

Jinie  c2  (v*,v (n(l), m(l), . . . ,  n(p),  m(p)),  Vjv  (n(l }. m(2), . . . ,  n(q),m(q)) )  =  0 
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if  p  #  (].  and  otherwise  (where  we  count  modulo  p  for  the  arguments  of  the  indices,  i.e., 
n{i+p )  =  n(i)J 


lim  ci 
N  — 


(Yn  {n  (l)tm(l) . n(p),  m(p)) ,  VN  (n(p),  m(p) . n(l ),  m(l  ))^ 


“  ]^[  (an{i+fc)+n(i)  “  an(i+Jt)**n(i))  (ttTn(i+Jt)+iri(i+l) 

fc=l  i=  I 


)■ 


Again,  it  is  crucial  to  realize  that  this  definition  allows  one  (albeit  in  a  complicated 
way)  to  express  every  second  order  mixed  moment,  i.e.,  a  limit  of  the  form 


Jinxes  (TV  (A$ 


aN 


n(?)gm(p) 


L,V 


hiqiTyMq) 


\v 


’N 


)) 


in  terms  of  the  second  order  limits  of  A  and  the  second  order  limits  of  B.  In  particular, 
asymptotic  freeness  of  second  order  also  implies  that  the  sum  A  t  B  of  our  random 
matrix  ensembles  lias  a  second  order  limit  distribution  and  allows  one  to  express  them 
in  principle  in  terms  of  the  second  order  limit  distribution  of  A  and  the  sec  ond  order 
limit  distribution  of  B.  As  in  the  erase  of  first  order  freeness,  it  is  not  clear  at  all 
how  this  calculation  of  the  fluctuations  of  A  +  B  out  of  the  fluctuations  of  A  and  t  he 
fluctuations  of  B  can  be  performed  effectively*  In  [2d]  Speiclier  and  colleagues  were  able 
to  solve  this  problem  by  providing  a  second  order  cumulant  machinery,  similar  to  the 
first  order  case*  Again,  the  idea  is  to  go  over  to  quantities  which  behave  like  cumulant, s 
in  this  setting*  The  actual  description  of  those  relies  on  combinatorial  objects  {annular 
non-crossing  permutations),  but  as  before  this  can  be  reformulated  in  terms  of  formal 
power  series.  The  definition  can  be  spelled  out  in  this  form  below* 


Definition  A, 24  (Collins,  Mingo,  Sniady,  Speicher  [23]).  Let  (an)n>]  and  (aMt,tt)m(ft>i 
describe  the  first  and  second  order  limit,  moments  of  a  random  matrix  ensemble.  We 
define  the  corresponding  first  and  second  order  free  cumulants  (itnjn>i  and  (Kmtn)mji>i 
by  the  following  requirement  in  terms  of  the  corresponding  generating  power  series .  Put 


C(x)  :=  1  +  y^K„a:n,  C(x,y)  ;=  ^  Km,nxmyn 


n>  I 


and 

M{x)  :=  1  +  Y,  a»*"i  A f(x,y)  :=  £  aw.nXmyn. 

n>1  TU.n>l 

Then  we  require  as  relations  between  these  formal  power  series  that 


C(xM(r))  -  M(x) 


(A  .2) 
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and  for  the  second  order 


M{x,y)  =  H(xM(x),yM(y)) 


M{x) 


M(y) 


(A.3) 


where 


H(x,y):=  C(x,  y)  -  xy-£^  log ( ) ) ,  (A.4) 


or  equivalently , 


,v  i(xM( x))  £(yM{y)) 

\Hx,y)  =  C(xM(x),yM(y))  ■  ■  — 

£(xM(x))  •  j;(yA/(y)) 


+  xy 


(  rfjrV- 

V  (s 


(xM(x)-yM(y))*  (x-y)2 


-?rbO-  (A'5) 


As  in  the  first  order  case,  instead  of  the  moment  power  series  A/(x,  y)  one  can 
consider  a  kind  of  second  order  Cauchy  transform,  defined  by 


G(x.y)  := 


xy 


If  wc  also  define  a  kind  of  second  order  7?  transform  7 Z(x.y)  by 


fc{x,y)  :=  —  C{x,y), 
xy 

then  the  formula  (A.5)  takes  on  a  particularly  nice  form; 

°<*> »>  -  {K(GW,  GM)  +  (GM_;GM)2  } -  (7^5-  ( A.6) 

G(x)  is  here,  as  before,  the  first  order  Cauchy  transform,  G(x)  =  jA/(l/.r). 

The  ^tT1  defined  above  deserve  the  name  “cumulants"  as  they  linearize  the  problem 
of  adding  random  matrices  which  are  asymptotically  free  of  second  order.  Namely,  we 
have  the  following  lemma,  which  provides,  together  with  {A. 6),  an  effective  machinery 
for  calculating  the  fluctuations  of  the  sum  of  asymptotically  free  random  matrices. 


Lemma  A. 25.  Let  A  and  B  he  two  random  matrix  ensembles  which  are  asymptotically 
free.  Then  one  has  for  all  m,  n  >  1  that 


^ A+B 

*71 


—  +  *!! 


and 


A+B  A  t  B 

m.n  nm,n  ~  iriM  * 


Alternatively , 


nA+H(x)  =  nA{x)  +  nB(x) 
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and 

KA+B{x,y)  =  KA(xty)  +  1l!i(x,y). 

Proof.  This  was  proved  by  Collins,  Mingo,  Sniady,  Speicher  in  [23].  □ 

Again,  one  can  express  the  second  order  cunni hints  as  limits  of  classical  cn umlauts 
of  entries  of  a  unitarily  invariant  matrix.  In  contrast  to  the  first  order  case,  we  have 
now  to  run  over  two  disjoint  cycles  in  the  indices  of  the  matrix  entries. 

Lemma  A. 26.  Let  A  —  (A,v)veN  be  a  unitarily  invariant  random  matrix  ensemble 
which  has  a  second  order  limit  distribution.  Then  the  second  order  free  cumulants  of 
this  matrix  ensemble  can  also  he  expressed  as  the  limit  of  classical  cumulants  of  the 
entries  of  the  random  matrices:  If  Ax  =  then 


<„  -  Jim  Nm+ncm+n(a\ 

AT— oc  1 


m 


fi 


(AO 

<(2)i{3) ' ' ' 


u 


{N) 

1 )  ’ 


a 


(/V) 

jomv 


a 


IN) 

mm- 


a 


m 

■>(«).  jin 


) 


for  airy  choice  of  distinct  *{1) _ _  'i(m),  j(l), . .  - ,  j(n). 

Proof.  This  was  proved  by  Collins,  Mingo,  Sniady,  Speicher  in  [23].  □ 


■  A. 3  Wishart  matrices  and  Proof  of  Proposition  4.27 

Wishart  matrices,  in  the  large  size  limit,  fit  quite  well  into  the  framework  of  first  and 
second  order  free  probability  theory.  In  particular,  their  free  cumulants  of  first  and 
second  order  are  quite  easy  to  determine  and  are  of  a  particularly  nice  form.  We  will 
use  this  to  give  a  proof  of  Proposition  4;27,  The  statements  in  that  proposition  go 
back  to  the  work  of  Bai  and  Silverstein,  see  e.g.,  [10]  who  give  a  more  direct  proof  via 
analytic  calculations  of  the  Cauchy  transforms.  We  prefer  here,  however,  to  show  how 
Wishart  matrices  fit  conceptually  into  the  frame  of  free  probability  theory. 

Let  us  remark  that  whereas  the  results  around  first  order  freeness  are  valid  for 
complex  as  well  as  real  random  matrices,  this  is  not  the  case  any  more  for  the  second 
order;  there  are  some  complications  to  be  dealt  with  in  this  case  and  at  the  moment  the 
theory  of  second  order  freeness  for  real  random  matrices  has  not  yet  been  developed. 
Thus  our  proof  of  the  fluctuation  formula  (4,13b)  will  only  cover  the  complex  ease.  The 
fact  that  the  real  case  differs  from  the  complex  case  by  a  factor  2  can  be  found  in  t  he 
work  of  Bai  and  Silverst.ein  [10] . 

Instead  of  looking  on  the  Wishart  matrix  S  :=  ^XX'  from  Equation  (4*1)  we  will 
consider  the  closely  related  matrix 


T  :=  —  X'X. 

m 

Note  that  S  is  a  m  x  m- matrix,  whereas  T  is  an  n  x  n  matrix.  The  relation  between 
the  spectral  behavior  of  those  two  matrices  is  quite  straightforward,  namely  they  have 
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the  same  non-zero  eigenvalues,  which  arc  filled  up  with  additional  zeros  for  the  larger 
one.  Thus  the  transition  between  these  two  matrices  is  very  easy;  their  eigenvalue 
distributions  are  related  by  a  rescaling  (since  the  first  order  moments  go  with  the 
normalized  trace)  and  their  fluctuations  are  the  same  (since  the  second  order  moments 
am,„  go  with  the  unnormalized  trace).  The  reason  for  considering  T  instead  of  S  is 
the  following  nice  description  of  its  first  and  second  order  distribution.  In  this  theorem 
we  will  realize  the  Wish  art.  matrix  S  =  JjXX'  with  covariance  matrix  E  in  the  form 
Si/2YY'E‘/2  where  Y  is  a  n  x  m  Gaussian  random  matrix  with  independent  entries 
of  mean  zero  and  variance  1/m.  The  matrix  T  takes  then  on  the  form 

T  =  Y'EY. 

Note  that  we  allow  £  to  be  itself  random  in  the  following  theorem. 

Proposition  A.31,  Let  £  =  (£n)n6  jy  be  a  random  matrix  ensemble  of  self  adjoint 
n  x  n-matrices  and  consider  in  addition  a  Gaussian  ensemble  Y  —  (Yn)ne^  of  non¬ 
self  adjoint  rectangular  Gaussian  n  x  m -random  matrices  (with  mean  zero  and  variance 
1/m  for  the  entries)  such  that  Y  and  £  are  independent  Put 

T  :=  (Y;£nY*)n€N. 

In  the  following  we  consider  the  limit 

it ,  m  — *  oo  such  that  lim  —  =:  c 

m 

for  some  fixed  c  £  (0,  >30 ) . 

(1)  Assume  that  the  limit  eigenvalue  distribution  of  £  —  (£n)neN  exists  for  n  — +  oc. 
Then  T,  considered  as  an  ensemble  of  m  x  m-random  matrices  Y^£nY„,  has  a  limit 
eigenvalue  distribution .  This  limit  eigenvalue  distribution  is  determined  by  the  fact  that 
its  free  cumulants  are  given  by  the  scaled  corresponding  limit  moments  of  £  *  L  e.  ,  for 
all  j  >  1  we  have 

T  E 

Kf  =  CCtj  - 

(2)  Assume  that  we  are  in  the  complex  case  and  that  £  —  (£n)n€N  has  a  second 
order  limit  distribution  for  n  — *  oo-  Then  T  has  a  second  order  limit  distribution,  which 
is  determined  as  follows:  for  all  i,j  >  1  we  have 

Kf  —  calf  and  Kfj  —  afj . 

Proof.  The  first  order  statement  of  this  theorem  is  due  to  Nica  and  Speicher,  see  [<>K], 
the  second  order  statement  follows  from  the  calculations  in  [64].  □ 

We  will  now  use  this  theorem  to  prove  our  Proposition  4.27  in  the  complex  case. 
Proof.  If 


A'/s(i)  =  i  +  y  ofr1 
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is  the  generating  power  series  for  the  limit  moments  of  E.  then  the  above  proposition 
says  that  the  generating  power  series  CT(.r)  for  the  free  cumulants  of  T  is  related  with 
M~(x)  by 


ct(x)  =  i  +  y  kJxi 

i>  1 

=  1  +  r  ^  of  x’ 
i>  1 

=  (1  -  c)  +  cM^(x). 

Thus,  by  the  general  relation  CT(:rA/T(:r))  =  A/T(:r),  we  get  the  generating  power 
series  A/T(ar)  for  the  limit  moments  of  T  as  a  solution  to  the  equation 

1  -  c  +  cMz[xMr(x)]  =  Mt(x).  ( A.7) 

Let  us  now  rewrite  this  for  the  Wishart  matrix  S.  Recall  that  the  moments  of  S 
and  the  moments  of  T  are  related  by  a  simple  scaling  factor,  resulting  in  a  relation  of 
the  form 

A/t(x)  =  c(Ms(x)  —  1)  +  1. 

This  gives 

Mt(x)  =  M“[x(cMT (x)  -c+  1)]. 

Rewriting  this  in  terms  of 

g(x)  :=  -Als( \/x)  and  0S(a?)  :=  -A/L(l/;r) 
yields  formula  (4.11). 

In  order  to  get  the  result  for  second  order  one  only  has  to  observe  that  the  fluctua¬ 
tions  of  a  non-random  covariance  matrix  vanish  identically,  lienee  CT(x ,  y)  =  Cs(./\  y)  = 
0.  and  thus  (A. 5)  reduces  directly  to  (4.13). 

□ 
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